DocumentCode :
3104930
Title :
Leveraging World Knowledge in Chinese Text Classification
Author :
Xu, Shu ; Sun, Maosong
fYear :
2007
fDate :
22-24 Aug. 2007
Firstpage :
33
Lastpage :
38
Abstract :
In state-of-the-art Text Classification (TC) approaches, only features explicitly mentioned in training set are taken into consideration, but after several decades´ endeavor, it seems that these approaches have all reached a plateau. In this paper, we propose an automatic taxonomy mapping algorithm to map from original flat taxonomy to a hierarchical, human-edit on-line taxonomy (ODP), from which we could then synthesize new training samples with common-sense world knowledge by performing a constrained web focus crawling. We show that by leveraging the domain-knowledge which otherwise can´t be deduced from training set directly, the text classifier will have better generalization ability. Preliminary Experimental Results on several Chinese data sets confirm the effectiveness of this approach.
Keywords :
Encyclopedias; Information technology; Intelligent systems; Internet; Learning systems; Natural languages; Sun; Taxonomy; Testing; Text categorization; Text ClassificationChinese ODPTaxonomy MappingFocused CrawlingWeb Page Classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Language Processing and Web Information Technology, 2007. ALPIT 2007. Sixth International Conference on
Conference_Location :
Luoyang, Henan, China
Print_ISBN :
978-0-7695-2930-1
Type :
conf
DOI :
10.1109/ALPIT.2007.105
Filename :
4460611
Link To Document :
بازگشت