• DocumentCode
    2876866
  • Title

    A Method for Judging Web-page Type

  • Author

    Xue Hong-Jun ; Chen Tao ; Xue Li-Min

  • Author_Institution
    Dept. of Inf. Warfare Study, Naval Command Coll., Nanjing, China
  • fYear
    2012
  • fDate
    17-18 Nov. 2012
  • Firstpage
    91
  • Lastpage
    93
  • Abstract
    This paper introduces a concept of information entropy to judge web-page types, which associates with the method put forward by Roadrunner that pre-purifying topic pages and then using proportional relation to judge the type of pages. With some typical pages from large website home, the average precision could be reached to 96.7%, which lays foundation for further information extracting work.
  • Keywords
    Web sites; data mining; entropy; text analysis; Roadrunner; Web site; Web-page type; information entropy; proportional relation; Accuracy; Data mining; Educational institutions; HTML; Information entropy; Noise reduction; Web pages; HTML Parser; web-page segmentation; web-page type;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Security (CIS), 2012 Eighth International Conference on
  • Conference_Location
    Guangzhou
  • Print_ISBN
    978-1-4673-4725-9
  • Type

    conf

  • DOI
    10.1109/CIS.2012.28
  • Filename
    6405873