Title :
Ontological based webpage classification
Author :
Ong, Wui Kheun ; Hong, Jer Lang ; Fauzi, Fariza ; Tan, Ee Xion
Author_Institution :
Sch. of Comput. & IT, Taylor´´s Univ., Malaysia
Abstract :
Current classification techniques use word matching and clustering techniques to classify webpages. These techniques use ad hoc approach of checking and matching the entire keywords in a webpage for classification. These methods are efficient but not without problems. In general, they suffer from the following problems 1) As they use brute force matching for the entire document, they tend to be slow in their operation 2) words in a document may have similar meaning but they may not be identical in their spelling 3) current techniques fail to match and identify phrases efficiently 4) they also fail to consider for word disambiguation. In this paper, we propose a novel and fast ontological-based webpage classification technique to classify a webpage with high accuracy. To speed up our system, we use a segmentation technique that utilizes visual boundary of a region and matches keywords within the region instead of the entire webpage. We also use a fast clustering technique to match keywords and label the page based on the nearest match. Experiment results show that our system is accurate in webpage classification.
Keywords :
Web sites; ontologies (artificial intelligence); pattern classification; pattern matching; word processing; Web page classification; ad hoc approach; clustering technique; document matching; keyword checking; keyword matching; ontology; region visual boundary; segmentation technique; word disambiguation; word matching; Accuracy; Classification algorithms; HTML; Presses; Semantics; Visualization; Web pages; Classification; Ontology; Webpage;
Conference_Titel :
Information Retrieval & Knowledge Management (CAMP), 2012 International Conference on
Conference_Location :
Kuala Lumpur
Print_ISBN :
978-1-4673-1091-8
DOI :
10.1109/InfRKM.2012.6205006