Application of the SpecHybrid Algorithm to text document clustering problem

Author

Uykan, Zekeriya ; Ganiz, Murat C.

Author_Institution

Electron. & Commun. Eng. Dept, Dogus Univ., Istanbul, Turkey

fYear

2011

fDate

15-18 June 2011

Firstpage

118

Lastpage

122

Abstract

In this paper, we present a relaxed version of the SpecHybrid Algorithm originally proposed for wireless cellular systems, and apply it to text document clustering problem. We conduct several experiments on two different datasets; a widely used benchmark dataset in English, and a Turkish textual dataset commonly used in text classification. Our results show that the proposed algorithm gives superior performance in text document clustering as compared to the standard k-means algorithm for any number of clusters while giving a comparable or better performance as compared to the standard EM algorithm for relatively large number of clusters depending on the similarity matrices used.

Keywords

expectation-maximisation algorithm; pattern classification; pattern clustering; text analysis; SpecHybrid algorithm; Turkish textual dataset; similarity matrices; standard EM algorithm; standard k-means algorithm; text classification; text document clustering problem; Classification algorithms; Clustering algorithms; Data mining; Entropy; Euclidean distance; Partitioning algorithms; Turkish document clustering; document clustering; max cut; spectral clustering; textual data mining;

fLanguage

English

Publisher

ieee

Conference_Titel

Innovations in Intelligent Systems and Applications (INISTA), 2011 International Symposium on

Conference_Location

Istanbul

Print_ISBN

978-1-61284-919-5

Type

conf

DOI

10.1109/INISTA.2011.5946085

Filename

5946085