Title :
A Class Core Extraction Method for Text Categorization
Author :
Yu, Shicai ; Zhang, Jianxing
Author_Institution :
Sch. of Comput. Sci. & Commun., Lanzhou Univ. of Technol., Lanzhou, China
Abstract :
Text categorization is an important research field within text mining. A document, actually, is often full of class-independent ¿general¿ words which many documents and classes share. These ¿general¿ words do harm to text categorization rather than contribute to the task. Inspired by human cognitive procedure in text classification task, we propose a novel approach called Class Core Extraction (CCE) method to extract¿core¿ terms from each class. The ¿core¿ terms, which include not only the single-words but also the combinations of words just like a simple description of context, must be those terms with strong distinguishing power. In testing phase, a suitable algorithm what we called ¿lottery¿ algorithm is also proposed, which use weighted matching strategy to make final categorization decision. The comparative experimentation two datasets shows that the accuracy of our approach outperforms the k-nearest-neighbor (kNN) based classifier, as well as outstanding efficiency compare with the Support Vector Machine (SVM) based classifier.
Keywords :
data mining; pattern classification; text analysis; class core extraction method; k-nearest-neighbor classification; lottery algorithm; support vector machine; text categorization; text mining; Computer science; Context modeling; Fuzzy systems; Humans; Organizing; Support vector machine classification; Support vector machines; Testing; Text categorization; Text mining; class core extraction; lottery algorithm; text categorization;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2009. FSKD '09. Sixth International Conference on
Conference_Location :
Tianjin
Print_ISBN :
978-0-7695-3735-1
DOI :
10.1109/FSKD.2009.572