Title :
Discriminative clustering of text documents
Author :
Peltonen, Jaakko ; Sinkkonen, Janne ; Kaski, Samuel
Author_Institution :
Neural Networks Res. Centre, Helsinki Univ. of Technol., Espoo, Finland
Abstract :
Vector-space and distributional methods for text document clustering are discussed. Discriminative clustering, a recently proposed method, uses external data to find task-relevant characteristics of the documents, yet the clustering is defined even with no external data. We introduce a distributional version of discriminative clustering that represents text documents as probability distributions. The methods are tested in the task of clustering scientific document abstracts, and the ability of the methods to predict an independent topical classification of the abstracts is compared. The discriminative methods found topically more meaningful clusters than the vector space and distributional clustering models.
Keywords :
bibliographic systems; classification; data mining; information retrieval; pattern clustering; probability; text analysis; data mining; discriminative text document clustering; distributional methods; information retrieval; probability distributions; scientific document abstract clustering; task-relevant characteristics; vector space methods; Abstracts; Clustering algorithms; Clustering methods; Extraterrestrial measurements; Indexing; Kernel; Large scale integration; Neural networks; Probability distribution; Testing;
Conference_Titel :
Neural Information Processing, 2002. ICONIP '02. Proceedings of the 9th International Conference on
Print_ISBN :
981-04-7524-1
DOI :
10.1109/ICONIP.2002.1199015