• DocumentCode
    3687622
  • Title

    Automatic data extraction of websites using data path matching and alignment

  • Author

    Yu-Chun Chu;Chiun-Chieh Hsu;Chen-Jhe Lee;Yu-Ting Tsai

  • Author_Institution
    Department of Information Management, National Taiwan University of Science and Technology, Taipei, Taiwan, R.O.C.
  • fYear
    2015
  • Firstpage
    60
  • Lastpage
    64
  • Abstract
    Since most of web pages contain their main information in data records, extracting data records enables one to obtain and integrate data from diverse sources of Internet. Therefore, data extraction of web pages has been a popular research issue in the last decade. The paper aims to automatically extract data records from web pages and identify items from those extracted records. The proposed approach utilizes Data Path Matching to effectively extract data records and Data Path Code Alignment to efficiently identify data items. Experimental results reveal that the method can extract data effectively.
  • Keywords
    "Data mining","Web pages","HTML","Visualization","Information filters","Yttrium"
  • Publisher
    ieee
  • Conference_Titel
    Digital Information Processing and Communications (ICDIPC), 2015 Fifth International Conference on
  • Type

    conf

  • DOI
    10.1109/ICDIPC.2015.7323006
  • Filename
    7323006