DocumentCode :
506910
Title :
A Class Core Extraction Method for Text Categorization
Author :
Yu, Shicai ; Zhang, Jianxing
Author_Institution :
Sch. of Comput. Sci. & Commun., Lanzhou Univ. of Technol., Lanzhou, China
Volume :
1
fYear :
2009
fDate :
14-16 Aug. 2009
Firstpage :
3
Lastpage :
7
Abstract :
Text categorization is an important research field within text mining. A document, actually, is often full of class-independent ¿general¿ words which many documents and classes share. These ¿general¿ words do harm to text categorization rather than contribute to the task. Inspired by human cognitive procedure in text classification task, we propose a novel approach called Class Core Extraction (CCE) method to extract¿core¿ terms from each class. The ¿core¿ terms, which include not only the single-words but also the combinations of words just like a simple description of context, must be those terms with strong distinguishing power. In testing phase, a suitable algorithm what we called ¿lottery¿ algorithm is also proposed, which use weighted matching strategy to make final categorization decision. The comparative experimentation two datasets shows that the accuracy of our approach outperforms the k-nearest-neighbor (kNN) based classifier, as well as outstanding efficiency compare with the Support Vector Machine (SVM) based classifier.
Keywords :
data mining; pattern classification; text analysis; class core extraction method; k-nearest-neighbor classification; lottery algorithm; support vector machine; text categorization; text mining; Computer science; Context modeling; Fuzzy systems; Humans; Organizing; Support vector machine classification; Support vector machines; Testing; Text categorization; Text mining; class core extraction; lottery algorithm; text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2009. FSKD '09. Sixth International Conference on
Conference_Location :
Tianjin
Print_ISBN :
978-0-7695-3735-1
Type :
conf
DOI :
10.1109/FSKD.2009.572
Filename :
5358667
Link To Document :
بازگشت