DocumentCode
3165514
Title
Improving Knowledge Discovery in Document Collections through Combining Text Retrieval and Link Analysis Techniques
Author
Jin, Wei ; Srihari, Rohini K. ; Ho, Hung Hay ; Wu, Xin
Author_Institution
State Univ. of New York, Buffalo
fYear
2007
fDate
28-31 Oct. 2007
Firstpage
193
Lastpage
202
Abstract
In this paper, we present Concept Chain Queries (CCQ), a special case of text mining in document collections focusing on detecting links between two topics across text documents. We interpret such a query as finding the most meaningful evidence trails across documents that connect these two topics. We propose to use link-analysis techniques over the extracted features provided by Information Extraction Engine for finding new knowledge. A graphical text representation and mining model is proposed which combines information retrieval, association mining and link analysis techniques. We present experiments on different datasets that demonstrate the effectiveness of our algorithm. Specifically, the algorithm generates ranked concept chains and evidence trails where the key terms representing significant relationships between topics are ranked high.
Keywords
data mining; document handling; information retrieval; natural language processing; concept chain queries; document collections; graphical text representation; information extraction engine; knowledge discovery; link analysis techniques; text documents; text mining; text retrieval; Computer science; Data engineering; Data mining; Engines; Feature extraction; Information analysis; Information retrieval; Knowledge engineering; Text mining; USA Councils;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on
Conference_Location
Omaha, NE
ISSN
1550-4786
Print_ISBN
978-0-7695-3018-5
Type
conf
DOI
10.1109/ICDM.2007.62
Filename
4470243
Link To Document