DocumentCode
3447526
Title
Discriminative clustering of text documents
Author
Peltonen, Jaakko ; Sinkkonen, Janne ; Kaski, Samuel
Author_Institution
Neural Networks Res. Centre, Helsinki Univ. of Technol., Espoo, Finland
Volume
4
fYear
2002
fDate
18-22 Nov. 2002
Firstpage
1956
Abstract
Vector-space and distributional methods for text document clustering are discussed. Discriminative clustering, a recently proposed method, uses external data to find task-relevant characteristics of the documents, yet the clustering is defined even with no external data. We introduce a distributional version of discriminative clustering that represents text documents as probability distributions. The methods are tested in the task of clustering scientific document abstracts, and the ability of the methods to predict an independent topical classification of the abstracts is compared. The discriminative methods found topically more meaningful clusters than the vector space and distributional clustering models.
Keywords
bibliographic systems; classification; data mining; information retrieval; pattern clustering; probability; text analysis; data mining; discriminative text document clustering; distributional methods; information retrieval; probability distributions; scientific document abstract clustering; task-relevant characteristics; vector space methods; Abstracts; Clustering algorithms; Clustering methods; Extraterrestrial measurements; Indexing; Kernel; Large scale integration; Neural networks; Probability distribution; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Information Processing, 2002. ICONIP '02. Proceedings of the 9th International Conference on
Print_ISBN
981-04-7524-1
Type
conf
DOI
10.1109/ICONIP.2002.1199015
Filename
1199015
Link To Document