DocumentCode :
3539731
Title :
Combining content-based and context-based methods for Persian web page classification
Author :
Farhoodi, Mojgan ; Yari, Alireza ; Mahmoudi, Maryam
Author_Institution :
Iran Telecommun. Res. Center, Iran
fYear :
2009
fDate :
4-6 Aug. 2009
Firstpage :
399
Lastpage :
404
Abstract :
As the Internet includes millions of web pages for each and every search query, a fast retrieving of the desired and related information from the Web becomes very challenging subject. Automatic classification of web pages into relevant categories is an important and effective way to deal with the difficulty of retrieving information from the Internet. There are many automatic classification methods and algorithms that have been propose for content-based or context-based features of web pages. In this paper we analyze these features and try to exploit a combination of features to improve categorization accuracy of Persian web page classification. We conduct various experiments on a dataset consisting of 352 pages belonging to Persian Wikipedia, using content-based and context-based web page features. Our experiments demonstrate the usefulness of combining these features.
Keywords :
Internet; Web design; content-based retrieval; pattern classification; Internet; Persian Wikipedia; Persian web page classification; automatic classification; content-based methods; context-based methods; information retrieval; Buildings; Classification algorithms; Content based retrieval; Humans; Information management; Information retrieval; Internet; Uniform resource locators; Web pages; Web sites;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Applications of Digital Information and Web Technologies, 2009. ICADIWT '09. Second International Conference on the
Conference_Location :
London
Print_ISBN :
978-1-4244-4456-4
Electronic_ISBN :
978-1-4244-4457-1
Type :
conf
DOI :
10.1109/ICADIWT.2009.5273915
Filename :
5273915
Link To Document :
بازگشت