Avantex offers the best Linux or Windows dedicated servers. Learn more.
Information Technology News.


University of Alberta to improve text mining technology

April 1, 2010

Add to del.icio.us     Digg this story Digg this

Click here to order the best dedicated server and at a great price.

In an attempt to underscore the role of data mining in today's research, a University of Alberta professor is helping to design and build text analysis tools. While the research project is important to academia, the Edmonton-based researcher said that improving the quality of text mining tools could yield some important benefits for companies as well.

The U of A's project is a joint effort between U.S.-based George Mason University and U.K.-based University of Hertfordshire.

The project's main goal is to demonstrate how tools in the digital humanities can improve the effectiveness of text mining large corporate databases.

Geoffrey Rockwell, a professor of philosophy and humanities computing at the University of Alberta says “if I were doing this for market research, I could track how we appear in discussion lists, what words are near the brand, and whether they are talking to stock up or down.”

Rockwell hopes the data mining tools he and his international colleagues are developing will revolutionize how people and companies sift through digital material and computer files.

The key is to draw multiple correlations from a body of text through the use of a mining tool, Rockwell added.

He said the explosion of blogs, wikis, digitalized books and various discussion forums has led to an acute information overload for many researchers, and the trend has been accelerating recently.

However, the same could also be true for marketers or other business units keeping tabs on what is being written about their companies on blogs, discussion forums or on websites.

The ultimate goal for the University of Alberta is to make these tools more accessible to students, consumers and businesses, and have them start appearing on blogs, wikis, discussion boards, and even embedded right into browsers.

Rockwell said that a search engine like Google is comparable to a card catalogue that directs you to a piece of information. This is also the way many text mining tools currently work, albeit some are a bit different than others.

Rockwell and his international colleagues are developing tools like TAPOR, a textual analysis tool that can summarize a body of text, identify important dates and discover the co-occurrences of two target words.

Some of the tools in TAPOR use forms of visualization to help researchers grasp the data even clearer. Another data analysis tool called Zotero, which works as a free Firefox extension, collects and manages various research sources from a large database.

Both TAPOR and Zotero are part of the Old Bailey project. “Most of these tools assume that you have the word you want to find,” Rockwell said. “But instead of looking for a needle in a haystack, an effective text mining tool will try and show you the shape of the haystack and tell you the words you might want to find.”

For example, one trend that Rockwell expects in this space is the rise of entity recognition, which could involve tools which recognize proper names, dates and various places. This would be useful to easily classify what a particular body of text is about, whether it's located in a document, a database, news clipping or other media.

On the other hand, text mining is by no means a novel technology. Still, vendors are increasingly making it accessible for applications like Voice of the Customer (VoC). Unstructured data channels -- in-bound e-mail, call centre conversations, blogs, SMS messages -- by virtue of being text-oriented are prime targets for mining, Rockwell said, as are social media.

For businesses, text mining and analysis of blogs or discussion forums can provide valuable insights into a customer sentiment.

Bruce Temkin, vice-president with Forrester Research says "the more structured approach of conducting multiple-choice questionnaires limit the depth of insight and breadth of customer feedback."

Add to del.icio.us     Digg this story Digg this

Source: University of Alberta.


IT News Archives | Site Search | Advertise on IT Direction | Contact | Home

All logos, trade marks or service marks on this site are the property of their respective owners.

Sponsored by Sun Hosting, by Sure Mail™, Avantex and by Montreal Server Colocation.

       © IT Direction. All rights reserved.