Title :
Chinese text classification based on improved domain ontology graph model-DOG
Author :
Guang Yang ; Jin-Kun Tian ; Yun-Hua Liu ; Zhong-Yi Lin ; Lei Wang ; Yu-Xin Chang
Author_Institution :
Run Technol. Co. Ltd., Beijing, China
Abstract :
Domain ontology graph (simply DOG) is a graphical ontology model which defines the necessary and effective domain knowledge representation components through extracting the high-dependent terms and establishing the relations between the corresponding terms. In the traditional DOG, the high frequency terms are selected as the nodes of DOG and the dependence between terms are calculated according to x2 statistic. The empirical study in this paper shows that there are two main limitations in the traditional DOG generation: 1) neglecting the dependence between term and class, i.e., domain; 2) measuring inaccurately dependence between different terms. Thus, in this paper, we propose an improve DOG model, names DOG*, which effectively solves these two defects mentioned above through measuring the dependence with mutual information and further selecting the high-dependent terms as the nodes in DOG generation. We compare the classification performance of DOG* with the traditional DOG based on ten different Chinese text datasets which are collected form the Chinese News Web site (www.people.com.cn). The experimental results show that our proposed DOG can significantly enhance the classification accuracies of traditional DOG.
Keywords :
graph theory; ontologies (artificial intelligence); pattern classification; statistics; text analysis; Chinese news Web site; Chinese text classification; DOG generation; DOG* model; domain knowledge representation components; domain ontology graph model; improved DOG model; mutual information; x2 statistic; Dependence; domain ontology graph; mutual information; text classification;
Conference_Titel :
Computer Science and Network Technology (ICCSNT), 2012 2nd International Conference on
Conference_Location :
Changchun
Print_ISBN :
978-1-4673-2963-7
DOI :
10.1109/ICCSNT.2012.6526167