Title :
WISDOM from Light-Weight Information Retrieval
Author :
Bracewell, David B. ; Gustafson, Steven ; Moitra, Abha ; Steuben, Gregg
Author_Institution :
GE Global Res., Niskayuna, NY, USA
Abstract :
This paper presents a light-weight information retrieval and analysis architecture that addresses the complex task of gathering, combining, and storing documents to enable indepth analysis. The growing interest in mining the Internet for conversation topics, opinions, and influencers has resulted in many free and commercial products. At the heart of such capability are two core technologies: information retrieval and text mining. While search engines and technologies like RSS make gathering information easier, they, like text mining, still require a significant amount of consideration when applying them in mission critical situations. For example, different search engines retrieve irrelevant results, and it is difficult to impossible to know that all relevant results have been found. Also, doing significant analysis of such documents will usually require the fusion of other information sources - a task that most search engines, at least, do not support. We have developed a system and architecture for light-weight document and information retrieval to enable focused and deep analysis of text, authors and publishers, and the networks that they form between each other through citations and other reference and co-occurrence analysis. While it is both intuitive and obvious that such a system is necessary for in-depth analysis, it is nontrivial as to how to construct such a system out of the many moving pieces, data sources and technologies. We show both the architecture, discuss the decisions steps, and demonstrate analysis that are enabled by the system.
Keywords :
data mining; information retrieval; search engines; text analysis; Internet; WISDOM; cooccurrence analysis; light-weight document retrieval; light-weight information retrieval; search engines; text mining; Data mining; Feeds; Google; Information services; Internet; Search engines; Web sites; Information Retrieval; Natural Language Processing; Open Source Intelligence Gathering; Text Mining;
Conference_Titel :
Social Computing (SocialCom), 2010 IEEE Second International Conference on
Conference_Location :
Minneapolis, MN
Print_ISBN :
978-1-4244-8439-3
Electronic_ISBN :
978-0-7695-4211-9
DOI :
10.1109/SocialCom.2010.57