Title :
Feature Selection for the Topic-Based Mixture Model in Factored Classification
Author_Institution :
Sch. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou
Abstract :
Topic-based mixture model (TBMM) is a learning algorithm for factored classification. In factored classification, the class label is factored into a vector of class features. For example, the class label for a personal Web page at a university might be described by two features: the academic discipline of the person, and their position (e.g., ´chemistry professor´ or ´physics student´). An approach to factored classification of text documents in which each document is assumed to be generated by a mixture of class features was proposed. Experiments in factored text classification problems show TBMM can outperform other two approaches for categories with especially sparse training data. In this paper, we analyze the feature selection for TBMM. For TBMM the feature space can be reduced to small number of feature terms with a significant improvement to classification accuracy. We present empirical results that indicate that TBMM is an adequate method to determine the feature terms for the supervised classification task
Keywords :
feature extraction; learning (artificial intelligence); pattern classification; text analysis; class features; class label; factored text classification; feature selection; feature space; learning algorithm; text documents; topic-based mixture model; Chemistry; Classification algorithms; Computer science; Gain measurement; Performance evaluation; Performance gain; Physics; Space technology; Text categorization; Web pages;
Conference_Titel :
Computational Intelligence and Security, 2006 International Conference on
Conference_Location :
Guangzhou
Print_ISBN :
1-4244-0605-6
Electronic_ISBN :
1-4244-0605-6
DOI :
10.1109/ICCIAS.2006.294087