Clustering Description Extraction Based on Statistical Machine Learning

Author

Zhang, Chengzhi ; Xu, Hongjiao

Author_Institution

Dept. of Inf. Manage., Nanjing Univ. of Sci. & Technol., Nanjing

Volume

2

fYear

2008

fDate

20-22 Dec. 2008

Firstpage

22

Lastpage

26

Abstract

Clustering description problem is one of key issues of the traditional document clustering algorithm. The traditional document algorithm can cluster the objects, but it can not give concept description for the clustered results. Document clustering description is a problem of labeling the clustered results of document collection clustering. It can help users determine whether one of the clusters is relevant to users´ information requirement. Therefore, labeling a clustered set of documents is an important and challenging work in document clustering applications. To resolve the problem of the weak readability of the traditional document clustering results, a method of automatic labeling documents clusters based on machine learning is put forward. Experimental results show that the method based on SVM will provide users with more concise and comprehensive document clustering results. It also reflects the linear trend of clustering description problem.

Keywords

document handling; learning (artificial intelligence); pattern clustering; statistical analysis; clustered document labeling; document clustering description extraction; document collection clustering; statistical machine learning; Clustering algorithms; Data mining; Frequency; Information management; Information technology; Labeling; Learning systems; Machine learning; Machine learning algorithms; Support vector machines; Clustering Description; Document Clustering; Statistical Machine Learning;

fLanguage

English

Publisher

ieee

Conference_Titel

Intelligent Information Technology Application, 2008. IITA '08. Second International Symposium on

Conference_Location

Shanghai

Print_ISBN

978-0-7695-3497-8

Type

conf

DOI

10.1109/IITA.2008.114

Filename

4739719