• DocumentCode
    2453612
  • Title

    An Improved Co-Similarity Measure for Document Clustering

  • Author

    Hussain, Syed Fawad ; Bisson, Gilles ; Grimal, Cléement

  • Author_Institution
    Lab. TIMC-IMAG, Univ. of Grenoble, Grenoble, France
  • fYear
    2010
  • fDate
    12-14 Dec. 2010
  • Firstpage
    190
  • Lastpage
    197
  • Abstract
    Co-clustering has been defined as a way to organize simultaneously subsets of instances and subsets of features in order to improve the clustering of both of them. In previous work, we proposed an efficient co-similarity measure allowing to simultaneously compute two similarity matrices between objects and features, each built on the basis of the other. Here we propose a generalization of this approach by introducing a notion of pseudo-norm and a pruning algorithm. Our experiments show that this new algorithm significantly improves the accuracy of the results when using either supervised or unsupervised feature selection data and that it outperforms other algorithms on various corpora.
  • Keywords
    feature extraction; pattern clustering; text analysis; corpora; cosimilarity measure; document clustering; feature selection; pruning algorithm; pseudonorm algorithm; similarity matrices; Clustering algorithms; Complexity theory; Equations; Oceans; Sea measurements; Semantics; Strontium; co-clustering; similarity measure; text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications (ICMLA), 2010 Ninth International Conference on
  • Conference_Location
    Washington, DC
  • Print_ISBN
    978-1-4244-9211-4
  • Type

    conf

  • DOI
    10.1109/ICMLA.2010.35
  • Filename
    5708832