• DocumentCode
    3570428
  • Title

    Entropy-Based Visual Tree Evaluation on Block Extraction

  • Author

    Cho, Wei-Ting ; Lin, Yu-Min ; Kao, Hung-Yu

  • Volume
    1
  • fYear
    2009
  • Firstpage
    580
  • Lastpage
    583
  • Abstract
    More and More people use Cascading Style Sheets (CSS) to manage their Web pages, because CSS is easy and convenient to typesetting. However, CSS makes a Web page displayed in an ambiguous structure. The data extraction systems that based on mining the Web page structure would generate false judgments for these CSS-rich pages. For solving this issue, we propose a system that applies properties of CSS Web pages to extract data blocks. In this system, Web pages are converted into a visual tree and the entropy attributes of each node in a visual tree is calculated. In the experiment, the result shows the node attributes and the visual tree are useful to extract blocks on CSS Web pages. Our system also outperforms with other systems on container block extraction.
  • Keywords
    Cascading style sheets; Computer science; Conference management; Containers; Data mining; Entropy; Intelligent agent; Search engines; Technology management; Web pages; DOM; Entropy; Information Extraction;
  • fLanguage
    English
  • Publisher
    iet
  • Conference_Titel
    Web Intelligence and Intelligent Agent Technologies, 2009. WI-IAT '09. IEEE/WIC/ACM International Joint Conferences on
  • Print_ISBN
    978-0-7695-3801-3
  • Electronic_ISBN
    978-1-4244-5331-3
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2009.98
  • Filename
    5286011