• DocumentCode
    2774150
  • Title

    Document Clustering Using Semantic Kernels Based on Term-Term Correlations

  • Author

    Farahat, Ahmed K. ; Kamel, Mohamed S.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Waterloo, Waterloo, ON, Canada
  • fYear
    2009
  • fDate
    6-6 Dec. 2009
  • Firstpage
    459
  • Lastpage
    464
  • Abstract
    Document clustering algorithms usually use vector space model (VSM) as their underlying model for document representation. VSM assumes that terms are independent and accordingly ignores any semantic relations between them. This results in mapping documents to a space where the proximity between document vectors does not reflect their true semantic similarity. In this paper, we propose the use of semantic kernels that are based on term-term correlations for improving the effectiveness of document clustering algorithms. The used kernels measure proximity between documents based on how their terms are statistically correlated. We analyze semantic kernels that capture different aspects of correlations between terms, and evaluate them by conducting experiments on different benchmark data sets. Results show that the proposed method achieves significant improvement in document clustering compared to VSM.
  • Keywords
    document handling; pattern clustering; vectors; document clustering; semantic kernel; term-term correlation; vector space model; Algorithm design and analysis; Clustering algorithms; Computational complexity; Conferences; Data mining; Kernel; Organizing; Text mining; Unsupervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops, 2009. ICDMW '09. IEEE International Conference on
  • Conference_Location
    Miami, FL
  • Print_ISBN
    978-1-4244-5384-9
  • Electronic_ISBN
    978-0-7695-3902-7
  • Type

    conf

  • DOI
    10.1109/ICDMW.2009.88
  • Filename
    5360448