DocumentCode :
2036239
Title :
Language specific crawling based on web pages features
Author :
Azimzadeh, Masomeh ; Yari, Alireza ; Kargar, Mohammad Javad
Author_Institution :
Iran Telecommun. Res. Center, Tehran, Iran
fYear :
2010
fDate :
2-4 March 2010
Firstpage :
17
Lastpage :
20
Abstract :
Since Word Wide Web contains large set of data in different languages, retrieving language specific information creates a new challenge in information retrieval called language specific crawling. In this paper, a new approach is purposed for language specific crawling in which a combination of some selected content and context features of web documents have been applied. This approach has been implemented for Persian language and evaluated in Iranian web domain. The evaluation results show how this approach can improve the performance of crawling from speed and coverage points of view.
Keywords :
Internet; document handling; information retrieval; Iranian Web domain; Persian language; Web documents; Web pages features; Word Wide Web; information retrieval; language specific crawling; Bandwidth; Crawlers; Data mining; Information resources; Information retrieval; Java; Ontologies; Testing; Thesauri; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia Computing and Information Technology (MCIT), 2010 International Conference on
Conference_Location :
Sharjah
Print_ISBN :
978-1-4244-7001-3
Type :
conf
DOI :
10.1109/MCIT.2010.5444865
Filename :
5444865
Link To Document :
بازگشت