Title :
An Efficient Feature Selection Using Hidden Topic in Text Categorization
Author :
Zhang, Zhiwei ; Phan, Xuan-Hieu ; Horiguchi, Susumu
Author_Institution :
Tohoku Univ., Sendai
Abstract :
Text categorization is an important research area in information retrieval. In order to save the storage space and get better accuracy, efficient and effective feature selection methods for reducing the data before analysis are highly desired. Usually, researches on feature selection use only a proper measurement such as information gain. In this paper, we propose a new feature selection method by adopting an attractive hidden topic analysis and entropy-based feature ranking. Experiments dealing with the well-known Reuters-21578 and Ohsumed datasets show that our method can achieve a better classification accuracy while reducing the feature dimension dramatically.
Keywords :
classification; feature extraction; information retrieval; classification; feature selection methods; hidden topic analysis; information retrieval; text categorization; Data analysis; Entropy; Filters; Gain measurement; Information retrieval; Linear discriminant analysis; Machine learning algorithms; Sampling methods; Text categorization; Vocabulary; feature selection;
Conference_Titel :
Advanced Information Networking and Applications - Workshops, 2008. AINAW 2008. 22nd International Conference on
Conference_Location :
Okinawa
Print_ISBN :
978-0-7695-3096-3
DOI :
10.1109/WAINA.2008.137