Title :
Conceptual graph based text classification
Author :
Yi Wan ; Tingting He ; Xinhui Tu
Author_Institution :
Sch. of Comput. Sci., Central China Normal Univ., Wuhan, China
Abstract :
Most traditional Wikipedia based methods use only article content information. By organizing Wikipedia articles as a graph, multi-information such as category and structure information can be utilized in our method. In this paper, we propose a novel method to do classification by using knowledge from a conceptual graph which is built from Wikipedia. First, we build a conceptual graph from Wikipedia. Each article is considered as a concept node. Titles, hyperlinks, texts and category information are used as edges to measure the relationship between those concepts. Each text is mapped to its respective set of nodes and Personalized PageRank (random walk) is then used to generate a set of most important node which can represent the text best. Finally the two sets are scored with a measure of vector similarity. We evaluate our techniques on the standard text classification dataset (20newsgroup), the results show the effectiveness of the proposed approach.
Keywords :
Web sites; graph theory; knowledge representation; pattern classification; text analysis; vectors; Wikipedia; conceptual graph; knowledge representation; personalized PageRank; random walk; text classification; vector similarity measure; Electronic publishing; Encyclopedias; Feature extraction; Internet; Knowledge based systems; Semantics; conceptual garph; personalized PageRank; semantic similarity; text classification;
Conference_Titel :
Progress in Informatics and Computing (PIC), 2014 International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4799-2033-4
DOI :
10.1109/PIC.2014.6972305