Title :
A Text Categorization Method Based on Local Document Frequency
Author :
Xia, Feng ; Jicun, Tian ; Zhihui, Liu
Author_Institution :
Sch. of Comput. Sci. & Technol., Civil Aviation Univ. of China, Tianjin, China
Abstract :
In this paper, a fast and effective text categorization method named TCBLDF is proposed. TCBLDF barely needs dimensionality reduction except a stop words removal and a document frequency based feature selection. It tries to capture the relationship between a term and a category label, thus eliminates the need to know the semantic contribution of a term makes to a document it occurs in. TCBLDF use a measure to evaluate the importance of each term for the categorization task, and then gives different weights to them according to the importance evaluations. By doing so, we can make important terms affect more when making classification decision. At last we compare the method to two conventional classification methods, a Naive Bayesian learning and a linear SVM learning method. Experimental results show that TCBLDF is faster than SVM with a comparable performance and more effective than Naive Bayes, thus can be a good alternative to these methods.
Keywords :
Bayes methods; classification; feature extraction; support vector machines; category label; classification decision; dimensionality reduction; feature selection; importance evaluations; linear SVM learning method; local document frequency; naive Bayesian learning; text categorization method; Bayesian methods; Classification tree analysis; Computer science; Frequency; Fuzzy systems; Learning systems; Machine learning; Support vector machine classification; Support vector machines; Text categorization; local document frequency; text categorization;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2009. FSKD '09. Sixth International Conference on
Conference_Location :
Tianjin
Print_ISBN :
978-0-7695-3735-1
DOI :
10.1109/FSKD.2009.291