Title :
A New Centroid-Based Classifier for Text Categorization
Author :
Chen, Lifei ; Ye, Yanfang ; Jiang, Qingshan
Author_Institution :
Xiamen Univ., Xiamen
Abstract :
In recent years, centroid-based document classifiers receive wide interests from text mining community because of their simplicity and linear-time complexity. However, the traditional centroid-based classifiers usually perform less effectively for Chinese text categorization. In this paper, we tackle the problem by developing a new way to calculate the class-specific weights for each term in the training phase; in the testing phase, the new documents are assigned to the centroid to which the document is most similar based on the weighted distance measurement. The experimental results demonstrate that the accuracy of our algorithm outperforms the traditional centroid-based classifiers, as well as outstanding efficiency compared with the Support Vector Machine (SVM) based classifiers for Chinese text categorization.
Keywords :
data mining; natural languages; pattern classification; support vector machines; text analysis; Chinese text categorization; SVM; centroid-based document classifiers; support vector machine; text mining; Application software; Clustering algorithms; Computer science; Frequency; Information retrieval; Machine learning algorithms; Support vector machine classification; Support vector machines; Text categorization; Text mining; centroid-based classifer; class-specific weighting; term weighting; text categorization;
Conference_Titel :
Advanced Information Networking and Applications - Workshops, 2008. AINAW 2008. 22nd International Conference on
Conference_Location :
Okinawa
Print_ISBN :
978-0-7695-3096-3
DOI :
10.1109/WAINA.2008.12