Title : 
Web indexing using HTML priority system
         
        
        
            Author_Institution : 
Dept. of Inf. Technol., SRM Univ., Kattankulathur, India
         
        
        
        
        
        
            Abstract : 
The unstructured nature and the sheer size of the World Wide Web make it a challenging task to index. This paper will discuss about how web can be incrementally indexed using Inverted Indices and Distributed Hash Table for efficiently organizing the data while incrementally build the index using the search mechanism itself, and HTML Priority System for ranking the pages to improve precision and recall. It also discusses certain challenges that a content-based ranking system must face to counter spam.
         
        
            Keywords : 
Internet; hypermedia markup languages; indexing; HTML priority system; Web indexing; World Wide Web; content-based ranking system; distributed hash table; inverted indices; spam; Crawlers; HTML; Indexing; Search engines; Uniform resource locators; Unsolicited electronic mail; Distributed Hash Tables; HTML Priority System; Inverted Index; Search Engine; Web Indexing;
         
        
        
        
            Conference_Titel : 
Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE), 2015 International Conference on
         
        
            Conference_Location : 
Noida
         
        
            Print_ISBN : 
978-1-4799-8432-9
         
        
        
            DOI : 
10.1109/ABLAZE.2015.7154929