Title :
A new algorithm based on centroid for text categorization
Author :
Shen, Chongwei ; Wu, Bin
Author_Institution :
Sch. of Comput. Sci., Beijing Univ. of Posts & Telecommun., Beijing, China
Abstract :
Text categorization is a hot topic and a key technology in data mining and information retrieval, so that it received wide attention recently. Centroid-based algorithm is an effective and robust approach. However it often suffers from the inductive bias or model misfit. In order to solve this problem, many researchers have put forward a number of improvement strategies which makes the centroid-based algorithm have a better performance. The paper proposed a novel approach to adjust the centroids which is called Weighted Margin adjusted Centroid based Algorithm (WMCA). Then it presented a lot of experimental comparison with some other algorithms by using 5 different public corpuses. The results showed that the WMCA algorithm has the best performance.
Keywords :
data mining; information retrieval; pattern classification; text analysis; WMCA algorithm; data mining; inductive bias; information retrieval; model misfit; text categorization; text classification; weighted margin adjusted centroid based algorithm; Classification algorithms; Educational institutions; Machine learning; Support vector machines; Text categorization; Training; Vectors; WMCA; centroid; margin; text categorization;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2012 9th International Conference on
Conference_Location :
Sichuan
Print_ISBN :
978-1-4673-0025-4
DOI :
10.1109/FSKD.2012.6234190