Title : 
Language specific crawling based on web pages features
         
        
            Author : 
Azimzadeh, Masomeh ; Yari, Alireza ; Kargar, Mohammad Javad
         
        
            Author_Institution : 
Iran Telecommun. Res. Center, Tehran, Iran
         
        
        
        
        
        
            Abstract : 
Since Word Wide Web contains large set of data in different languages, retrieving language specific information creates a new challenge in information retrieval called language specific crawling. In this paper, a new approach is purposed for language specific crawling in which a combination of some selected content and context features of web documents have been applied. This approach has been implemented for Persian language and evaluated in Iranian web domain. The evaluation results show how this approach can improve the performance of crawling from speed and coverage points of view.
         
        
            Keywords : 
Internet; document handling; information retrieval; Iranian Web domain; Persian language; Web documents; Web pages features; Word Wide Web; information retrieval; language specific crawling; Bandwidth; Crawlers; Data mining; Information resources; Information retrieval; Java; Ontologies; Testing; Thesauri; Web pages;
         
        
        
        
            Conference_Titel : 
Multimedia Computing and Information Technology (MCIT), 2010 International Conference on
         
        
            Conference_Location : 
Sharjah
         
        
            Print_ISBN : 
978-1-4244-7001-3
         
        
        
            DOI : 
10.1109/MCIT.2010.5444865