DocumentCode :
2860466
Title :
Pseudo-Supervised Clustering for Text Documents
Author :
Maggini, M. ; Rigutini, L. ; Turchi, M.
Author_Institution :
Università di Siena, Italy
fYear :
2004
fDate :
20-24 Sept. 2004
Firstpage :
363
Lastpage :
369
Abstract :
Effective solutions for Web search engines can take advantage of algorithms for the automatic organization of documents into homogeneous clusters. Unfortunately, document clustering is not an easy task especially when the documents share a common set of topics, like in vertical search engines. In this paper we propose two clustering algorithms which can be tuned by the feedback of an expert. The feedback is used to choose an appropriate basis for the representation of documents, while the clustering is performed in the projected space. The algorithms are evaluated on a dataset containing papers from computer science conferences. The results show that an appropriate choice of the representation basis can yield better performance with respect to the original vector space model.
Keywords :
Application software; Clustering algorithms; Clustering methods; Computer science; Feedback; Frequency; Navigation; Search engines; Text processing; Web search;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
Print_ISBN :
0-7695-2100-2
Type :
conf
DOI :
10.1109/WI.2004.10138
Filename :
1410827
Link To Document :
بازگشت