DocumentCode :
2863469
Title :
A Confidence-Based Hierarchical Feature Clustering Algorithm for Text Classification
Author :
Jiang, Jung-Yi ; Yin, Kai-Tai ; Lee, Shie-Jue
fYear :
2007
fDate :
11-13 Oct. 2007
Firstpage :
161
Lastpage :
164
Abstract :
In this paper, we propose a novel feature reduction ap- proach to group words hierarchically into clusters which can then be used as new features for document classifica- tion. Initially, each word constitutes a cluster. We calculate the mutual confidence between any two different words. The pair of clusters containing the two words with the highest mutual confidence are combined into a new cluster. This process of merging is iterated until all the mutual confi- dences between the un-processed pair of words are smaller than a predefined threshold or only one cluster exists. In this way, a hierarchy of word clusters is obtained. The user can decide the clusters, from a certain level, to be used as new features for document classification. Experimental re- sults have shown that our method can perform better than other methods.
Keywords :
Classification algorithms; Clustering algorithms; Clustering methods; Feature extraction; Gain measurement; Merging; Pervasive computing; Software libraries; Text categorization; Text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Pervasive Computing, 2007. IPC. The 2007 International Conference on
Conference_Location :
Jeju City
Print_ISBN :
978-0-7695-3006-2
Type :
conf
DOI :
10.1109/IPC.2007.35
Filename :
4438416
Link To Document :
بازگشت