Title :
An Information Classification Approach Based on Knowledge Network
Author :
Huakang Li ; Guozi Sun ; Bei Xu ; Li Li ; Jie Huang ; Tanno, Keita ; Wenxu Wu ; Changen Xu
Author_Institution :
Lab. for Wireless Sensor Networks, Nanjing Univ. of Posts & TELE, Nanjing, China
Abstract :
Numerous critical Internet applications with high-quality services, such as Web directory, search engine, Web crawler, recommendation system and user profile detector, etc. Almost depend on the efficient and accurate of web page classification system. Traditional supervised or semi-supervised machine learning methods become more and more difficult to adapt to the explosive Internet information. This paper proposed a web page classification method based on the topological structure of Wikipedia knowledge network. The kinship-relation association based on content similarity was proposed to solve the unbalance problem when a category node inherited the probability from multiple fathers. We used N-gram based on Wikipedia words to extract the keywords from web page, and introduce Bayes classifier to estimate the page class probability. Experimental results shown that the proposed method has very good scalability, robustness and reliability for different web pages.
Keywords :
Internet; Web sites; classification; learning (artificial intelligence); probability; Bayes classifier; Internet applications; N-gram; Web crawler; Web directory; Web page classification system; Wikipedia knowledge network; Wikipedia words; category node; content similarity; information classification approach; keyword extraction; kinship-relation association; page class probability estimation; recommendation system; search engine; semisupervised machine learning method; topological structure; user profile detector; Benchmark testing; Electronic publishing; Encyclopedias; Internet; Knowledge based systems; Web pages;
Conference_Titel :
Embedded Multicore/Manycore SoCs (MCSoc), 2014 IEEE 8th International Symposium on
Conference_Location :
Aizu-Wakamatsu
DOI :
10.1109/MCSoC.2014.10