Title :
Combining content-based and context-based methods for Persian web page classification
Author :
Farhoodi, Mojgan ; Yari, Alireza ; Mahmoudi, Maryam
Author_Institution :
Iran Telecommun. Res. Center, Iran
Abstract :
As the Internet includes millions of web pages for each and every search query, a fast retrieving of the desired and related information from the Web becomes very challenging subject. Automatic classification of web pages into relevant categories is an important and effective way to deal with the difficulty of retrieving information from the Internet. There are many automatic classification methods and algorithms that have been propose for content-based or context-based features of web pages. In this paper we analyze these features and try to exploit a combination of features to improve categorization accuracy of Persian web page classification. We conduct various experiments on a dataset consisting of 352 pages belonging to Persian Wikipedia, using content-based and context-based web page features. Our experiments demonstrate the usefulness of combining these features.
Keywords :
Internet; Web design; content-based retrieval; pattern classification; Internet; Persian Wikipedia; Persian web page classification; automatic classification; content-based methods; context-based methods; information retrieval; Buildings; Classification algorithms; Content based retrieval; Humans; Information management; Information retrieval; Internet; Uniform resource locators; Web pages; Web sites;
Conference_Titel :
Applications of Digital Information and Web Technologies, 2009. ICADIWT '09. Second International Conference on the
Conference_Location :
London
Print_ISBN :
978-1-4244-4456-4
Electronic_ISBN :
978-1-4244-4457-1
DOI :
10.1109/ICADIWT.2009.5273915