DocumentCode
671513
Title
Coupled term-term relation analysis for document clustering
Author
Xin Cheng ; Duoqian Miao ; Can Wang ; Longbing Cao
Author_Institution
Dept. of Comput. Sci. & Technol., Tongji Univ., Shanghai, China
fYear
2013
fDate
4-9 Aug. 2013
Firstpage
1
Lastpage
8
Abstract
Traditional document clustering approaches are usually based on the Bag of Words model, which is limited by its assumption of the independence among terms. Recent strategies have been proposed to capture the relation between terms based on statistical analysis, and they estimate the relation between terms purely by their co-occurrence across the documents. However, the implicit interactions with other link terms are overlooked, which leads to the discovery of incomplete information. This paper proposes a coupled term-term relation model for document representation, which considers both the intra-relation (i.e. co-occurrence of terms) and inter-relation (i.e. dependency of terms via link terms) between a pair of terms. The coupled relation for each pair of terms is further used to map a document onto a new feature space, which includes more semantic information. Substantial experiments verify that the document clustering incorporated with our proposed relation achieves a significant performance improvement compared to the state-of-the-art techniques.
Keywords
data mining; document handling; pattern clustering; statistical analysis; coupled term-term relation analysis; document clustering; document representation; feature space; semantic information; statistical analysis; Computer science; Context; Data mining; Frequency measurement; Semantics; Sparse matrices; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks (IJCNN), The 2013 International Joint Conference on
Conference_Location
Dallas, TX
ISSN
2161-4393
Print_ISBN
978-1-4673-6128-6
Type
conf
DOI
10.1109/IJCNN.2013.6706853
Filename
6706853
Link To Document