Title :
Clustering Description Extraction Based on Statistical Machine Learning
Author :
Zhang, Chengzhi ; Xu, Hongjiao
Author_Institution :
Dept. of Inf. Manage., Nanjing Univ. of Sci. & Technol., Nanjing
Abstract :
Clustering description problem is one of key issues of the traditional document clustering algorithm. The traditional document algorithm can cluster the objects, but it can not give concept description for the clustered results. Document clustering description is a problem of labeling the clustered results of document collection clustering. It can help users determine whether one of the clusters is relevant to users´ information requirement. Therefore, labeling a clustered set of documents is an important and challenging work in document clustering applications. To resolve the problem of the weak readability of the traditional document clustering results, a method of automatic labeling documents clusters based on machine learning is put forward. Experimental results show that the method based on SVM will provide users with more concise and comprehensive document clustering results. It also reflects the linear trend of clustering description problem.
Keywords :
document handling; learning (artificial intelligence); pattern clustering; statistical analysis; clustered document labeling; document clustering description extraction; document collection clustering; statistical machine learning; Clustering algorithms; Data mining; Frequency; Information management; Information technology; Labeling; Learning systems; Machine learning; Machine learning algorithms; Support vector machines; Clustering Description; Document Clustering; Statistical Machine Learning;
Conference_Titel :
Intelligent Information Technology Application, 2008. IITA '08. Second International Symposium on
Conference_Location :
Shanghai
Print_ISBN :
978-0-7695-3497-8
DOI :
10.1109/IITA.2008.114