Title :
Feature Reduction for Text Categorization Using Cluster-Based Discriminant Coefficient
Author :
Li-Ju Gao ; Been-Chian Chien
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Univ. of Tainan, Tainan, Taiwan
Abstract :
Text classification is an important research topic for managing numerous electronic documents. Feature reduction is the key issue for text classification with high dimensional keywords. A document analysis method called discriminant coefficient was proposed to reduce features and achieve high precision text classification. However, the main problem of the discriminant based feature reduction method is that the final number of reduced features is exactly equal to the number of document classes. Although the precisions of classification are high in such a method, the recalls are relatively low. In this paper, we propose an improvement on the analyzing method indiscriminant coefficients. We apply a simple clustering method to distinguish the documents in each document class to reserve hidden differences among keywords in the same class. The clustering results can help to adjust the number of reduction features flexibly. The experimental results show that the proposed clustering mechanism supports adaptive features reduction and both of the recall and F1 measurements are improved.
Keywords :
feature extraction; pattern classification; pattern clustering; statistical analysis; text analysis; F1 measurements; cluster-based discriminant coefficient; clustering mechanism; discriminant based feature reduction method; discriminant coefficient; document analysis method; electronic documents; high dimensional keywords; high-precision text classification; text categorization; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Clustering methods; Feature extraction; Matrix converters; Text categorization; classification; discriminant coefficient; feature clustering; feature reduction;
Conference_Titel :
Technologies and Applications of Artificial Intelligence (TAAI), 2012 Conference on
Conference_Location :
Tainan
Print_ISBN :
978-1-4673-4976-5
DOI :
10.1109/TAAI.2012.16