• DocumentCode
    589898
  • Title

    Improving navigation page detection by using DOM-based block text identification

  • Author

    Li Yue ; Dong Shou-bin ; Zheng Xiang ; Ma Bin-Hua

  • Author_Institution
    Sch. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou, China
  • fYear
    2012
  • fDate
    21-23 Nov. 2012
  • Firstpage
    129
  • Lastpage
    134
  • Abstract
    Internet changes very fast, it is necessary to classify the web pages for different usages. According to user purpose, web pages can be classified into navigation pages and content pages. To detect navigation pages is useful for web crawling, topical detection, etc. In this paper, we use DOM-Based block text identification method to improve navigation pages detection. Experimental results suggest that, compared to prior methods, our method is more effective.
  • Keywords
    Internet; pattern classification; text analysis; DOM-based block text identification; Internet; Web crawling; Web page classification; content page; navigation page detection; topical detection; Abstracts; Bars; Business; HTML; Navigation; Noise; Web pages; DOM; block text identification; navigation pages; web pages classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    ICT and Knowledge Engineering (ICT & Knowledge Engineering), 2012 10th International Conference on
  • Conference_Location
    Bangkok
  • ISSN
    2157-0981
  • Print_ISBN
    978-1-4673-2316-1
  • Type

    conf

  • DOI
    10.1109/ICTKE.2012.6408541
  • Filename
    6408541