• DocumentCode
    2143519
  • Title

    Application of the SpecHybrid Algorithm to text document clustering problem

  • Author

    Uykan, Zekeriya ; Ganiz, Murat C.

  • Author_Institution
    Electron. & Commun. Eng. Dept, Dogus Univ., Istanbul, Turkey
  • fYear
    2011
  • fDate
    15-18 June 2011
  • Firstpage
    118
  • Lastpage
    122
  • Abstract
    In this paper, we present a relaxed version of the SpecHybrid Algorithm originally proposed for wireless cellular systems, and apply it to text document clustering problem. We conduct several experiments on two different datasets; a widely used benchmark dataset in English, and a Turkish textual dataset commonly used in text classification. Our results show that the proposed algorithm gives superior performance in text document clustering as compared to the standard k-means algorithm for any number of clusters while giving a comparable or better performance as compared to the standard EM algorithm for relatively large number of clusters depending on the similarity matrices used.
  • Keywords
    expectation-maximisation algorithm; pattern classification; pattern clustering; text analysis; SpecHybrid algorithm; Turkish textual dataset; similarity matrices; standard EM algorithm; standard k-means algorithm; text classification; text document clustering problem; Classification algorithms; Clustering algorithms; Data mining; Entropy; Euclidean distance; Partitioning algorithms; Turkish document clustering; document clustering; max cut; spectral clustering; textual data mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Innovations in Intelligent Systems and Applications (INISTA), 2011 International Symposium on
  • Conference_Location
    Istanbul
  • Print_ISBN
    978-1-61284-919-5
  • Type

    conf

  • DOI
    10.1109/INISTA.2011.5946085
  • Filename
    5946085