• DocumentCode
    2861665
  • Title

    Detecting and Partitioning Data Objects in Complex Web Pages

  • Author

    Ye, Shiren ; Chua, Tat-Seng

  • Author_Institution
    National University of Singapore
  • fYear
    2004
  • fDate
    20-24 Sept. 2004
  • Firstpage
    669
  • Lastpage
    672
  • Abstract
    This paper presents an automated approach to detect and partition data objects or product description from complex Web pages. First, we derive the common page structure by comparing similar pages, and then identify data region covering the descriptions of data objects. Second, we partition the nodes belonging to different data objects in the data region and construct the self-explainable XML output files. The experiments indicate that our technique is effective.
  • Keywords
    Business; Cleaning; Companies; Data mining; Navigation; Object detection; Organizing; Partitioning algorithms; Web pages; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
  • Print_ISBN
    0-7695-2100-2
  • Type

    conf

  • DOI
    10.1109/WI.2004.10155
  • Filename
    1410893