Title :
Classifying Web pages using adaptive ontology
Author :
Noh, Sanguk ; Seo, Aaesung ; Choi, Jaehyuk ; Choi, Kyunghee ; Jung, Gihyun
Author_Institution :
Sch. of Comput. Sci., Catholic Univ. of Korea, South Korea
Abstract :
In this paper, we present an automated Web page classifier based on adaptive ontology. As a first step, to identify the representative terms given a set of classes, we compute the product of term frequency and document frequency. Secondly, the information gain of each term prioritizes it based on the possibility of classification. We compile the selected terms and classification into rules using machine learning algorithms. The compiled rules classify any Web page into categories defined on a domain ontology. In the experiments, 11 terms out of 1,700 terms were identified as representative features given a set of Web pages. The resulting accuracy of the classification was, on the average, 95.2%.
Keywords :
Web sites; classification; information retrieval; learning (artificial intelligence); Web page classifier; adaptive ontology; document frequency; information gain; machine learning algorithms; representative features; term frequency; Communication networks; Computer science; Frequency; Information analysis; Machine learning; Machine learning algorithms; Ontologies; Protocols; Waste materials; Web pages;
Conference_Titel :
Systems, Man and Cybernetics, 2003. IEEE International Conference on
Print_ISBN :
0-7803-7952-7
DOI :
10.1109/ICSMC.2003.1244201