DocumentCode
2774150
Title
Document Clustering Using Semantic Kernels Based on Term-Term Correlations
Author
Farahat, Ahmed K. ; Kamel, Mohamed S.
Author_Institution
Dept. of Electr. & Comput. Eng., Univ. of Waterloo, Waterloo, ON, Canada
fYear
2009
fDate
6-6 Dec. 2009
Firstpage
459
Lastpage
464
Abstract
Document clustering algorithms usually use vector space model (VSM) as their underlying model for document representation. VSM assumes that terms are independent and accordingly ignores any semantic relations between them. This results in mapping documents to a space where the proximity between document vectors does not reflect their true semantic similarity. In this paper, we propose the use of semantic kernels that are based on term-term correlations for improving the effectiveness of document clustering algorithms. The used kernels measure proximity between documents based on how their terms are statistically correlated. We analyze semantic kernels that capture different aspects of correlations between terms, and evaluate them by conducting experiments on different benchmark data sets. Results show that the proposed method achieves significant improvement in document clustering compared to VSM.
Keywords
document handling; pattern clustering; vectors; document clustering; semantic kernel; term-term correlation; vector space model; Algorithm design and analysis; Clustering algorithms; Computational complexity; Conferences; Data mining; Kernel; Organizing; Text mining; Unsupervised learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining Workshops, 2009. ICDMW '09. IEEE International Conference on
Conference_Location
Miami, FL
Print_ISBN
978-1-4244-5384-9
Electronic_ISBN
978-0-7695-3902-7
Type
conf
DOI
10.1109/ICDMW.2009.88
Filename
5360448
Link To Document