DocumentCode :
3261836
Title :
A Hybrid Strategy for Clustering Data Mining Documents
Author :
Peng, Yi ; Kou, Gang ; Shi, Yong ; Chen, Zhengxin
Author_Institution :
Coll. of Inf. Sci. & Technol., Nebraska Univ., Omaha, NE
fYear :
2006
fDate :
Dec. 2006
Firstpage :
838
Lastpage :
842
Abstract :
With the increase in the number of electronic documents, it is hard to manually organize, analyze and present these documents efficiently. Document clustering, which automatically groups similar or related documents together, has been used in practical applications to understand the contents and structures of documents. Although a variety of methods and algorithms have been proposed, it is still a challenging task to generate meaningful document clusters. This paper uses an approach that combines quantitative and qualitative methods in order to create high-quality clusters for a collection of data mining and knowledge discovery (DMKD) publications. The quantitative method extracts a list of noun/noun phrases from the DMKD documents and uses an optimization procedure from CLUTO toolkit to assign documents to clusters. The qualitative method uses grounded theory to identify major categories of the documents to improve the comprehensibility of resultant clusters. The results demonstrate that the strategy produces more meaningful clusters than single-term k-way clustering algorithm in terms of internal metrics and human assessment
Keywords :
data mining; document handling; pattern clustering; CLUTO toolkit; data mining; document clustering; electronic documents; hard clustering; human assessment; internal metrics; k-way clustering; knowledge discovery; soft clustering; Classification algorithms; Clustering algorithms; Data mining; Databases; Educational institutions; Humans; Information retrieval; Information science; Organizing; Partitioning algorithms; Data mining; Document clustering; Grounded theory; Hard clustering; Optimization algorithm; Soft; clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops, 2006. ICDM Workshops 2006. Sixth IEEE International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2702-7
Type :
conf
DOI :
10.1109/ICDMW.2006.6
Filename :
4063742
Link To Document :
بازگشت