Abstract :
In state-of-the-art Text Classification (TC) approaches, only features explicitly mentioned in training set are taken into consideration, but after several decades´ endeavor, it seems that these approaches have all reached a plateau. In this paper, we propose an automatic taxonomy mapping algorithm to map from original flat taxonomy to a hierarchical, human-edit on-line taxonomy (ODP), from which we could then synthesize new training samples with common-sense world knowledge by performing a constrained web focus crawling. We show that by leveraging the domain-knowledge which otherwise can´t be deduced from training set directly, the text classifier will have better generalization ability. Preliminary Experimental Results on several Chinese data sets confirm the effectiveness of this approach.