• DocumentCode
    3447526
  • Title

    Discriminative clustering of text documents

  • Author

    Peltonen, Jaakko ; Sinkkonen, Janne ; Kaski, Samuel

  • Author_Institution
    Neural Networks Res. Centre, Helsinki Univ. of Technol., Espoo, Finland
  • Volume
    4
  • fYear
    2002
  • fDate
    18-22 Nov. 2002
  • Firstpage
    1956
  • Abstract
    Vector-space and distributional methods for text document clustering are discussed. Discriminative clustering, a recently proposed method, uses external data to find task-relevant characteristics of the documents, yet the clustering is defined even with no external data. We introduce a distributional version of discriminative clustering that represents text documents as probability distributions. The methods are tested in the task of clustering scientific document abstracts, and the ability of the methods to predict an independent topical classification of the abstracts is compared. The discriminative methods found topically more meaningful clusters than the vector space and distributional clustering models.
  • Keywords
    bibliographic systems; classification; data mining; information retrieval; pattern clustering; probability; text analysis; data mining; discriminative text document clustering; distributional methods; information retrieval; probability distributions; scientific document abstract clustering; task-relevant characteristics; vector space methods; Abstracts; Clustering algorithms; Clustering methods; Extraterrestrial measurements; Indexing; Kernel; Large scale integration; Neural networks; Probability distribution; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Information Processing, 2002. ICONIP '02. Proceedings of the 9th International Conference on
  • Print_ISBN
    981-04-7524-1
  • Type

    conf

  • DOI
    10.1109/ICONIP.2002.1199015
  • Filename
    1199015