DocumentCode
1126056
Title
Introducing a family of linear measures for feature selection in text categorization
Author
Combarro, Elís F. ; Montañés, Elena ; Díaz, Irene ; Ranilla, José ; Mones, Ricardo
Author_Institution
Artificial Intelligence Center, Oviedo Univ., Gijon, Spain
Volume
17
Issue
9
fYear
2005
Firstpage
1223
Lastpage
1232
Abstract
Text categorization, which consists of automatically assigning documents to a set of categories, usually involves the management of a huge number of features. Most of them are irrelevant and others introduce noise which could mislead the classifiers. Thus, feature reduction is often performed in order to increase the efficiency and effectiveness of the classification. In this paper, we propose to select relevant features by means of a family of linear filtering measures which are simpler than the usual measures applied for this purpose. We carry out experiments over two different corpora and find that the proposed measures perform better than the existing ones.
Keywords
classification; feature extraction; information filtering; learning (artificial intelligence); pattern classification; text analysis; document classification; feature reduction; feature selection; linear filtering measures; machine learning; text categorization; Availability; Filtering; Frequency; Humans; Machine learning; Maximum likelihood detection; Nonlinear filters; Performance evaluation; Text categorization; Wrapping; Index Terms- Text categorization; feature selection; filtering measures; machine learning.;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2005.149
Filename
1490529
Link To Document