DocumentCode
2143519
Title
Application of the SpecHybrid Algorithm to text document clustering problem
Author
Uykan, Zekeriya ; Ganiz, Murat C.
Author_Institution
Electron. & Commun. Eng. Dept, Dogus Univ., Istanbul, Turkey
fYear
2011
fDate
15-18 June 2011
Firstpage
118
Lastpage
122
Abstract
In this paper, we present a relaxed version of the SpecHybrid Algorithm originally proposed for wireless cellular systems, and apply it to text document clustering problem. We conduct several experiments on two different datasets; a widely used benchmark dataset in English, and a Turkish textual dataset commonly used in text classification. Our results show that the proposed algorithm gives superior performance in text document clustering as compared to the standard k-means algorithm for any number of clusters while giving a comparable or better performance as compared to the standard EM algorithm for relatively large number of clusters depending on the similarity matrices used.
Keywords
expectation-maximisation algorithm; pattern classification; pattern clustering; text analysis; SpecHybrid algorithm; Turkish textual dataset; similarity matrices; standard EM algorithm; standard k-means algorithm; text classification; text document clustering problem; Classification algorithms; Clustering algorithms; Data mining; Entropy; Euclidean distance; Partitioning algorithms; Turkish document clustering; document clustering; max cut; spectral clustering; textual data mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Innovations in Intelligent Systems and Applications (INISTA), 2011 International Symposium on
Conference_Location
Istanbul
Print_ISBN
978-1-61284-919-5
Type
conf
DOI
10.1109/INISTA.2011.5946085
Filename
5946085
Link To Document