Title : 
Semantics-Based Extraction of Webpage Main Text
         
        
            Author : 
Fengjiao, Han ; Zhurong, Zhou
         
        
            Author_Institution : 
Coll. of Comput. & Inf. Sci., Southwest Univ., Chongqing, China
         
        
        
        
        
        
            Abstract : 
Extraction of web page main text is one of the most efficient methods to improve search engine. In the traditional method, the extraction of the web page main text use the similarity of DOM sub-tree as a end condition for the DOM tree traversing, while its speed is unsatisfactory on such a complex web page structure. Thus, to raise the traverse speed and accuracy of DOM sub-tree effectively, we propose a method which is Semantics-based Extraction of Web page Main text.
         
        
            Keywords : 
Web sites; search engines; semantic Web; text analysis; DOM sub-tree; DOM tree traversing; Webpage main text; complex Webpage structure; search engine; semantics-based extraction; Accuracy; Computers; Data mining; Educational institutions; HTML; Navigation; Semantics; Extraction; Semantics; Webpage;
         
        
        
        
            Conference_Titel : 
Semantics, Knowledge and Grids (SKG), 2012 Eighth International Conference on
         
        
            Conference_Location : 
Beijing
         
        
            Print_ISBN : 
978-1-4673-2561-5
         
        
        
            DOI : 
10.1109/SKG.2012.47