DocumentCode
3687622
Title
Automatic data extraction of websites using data path matching and alignment
Author
Yu-Chun Chu;Chiun-Chieh Hsu;Chen-Jhe Lee;Yu-Ting Tsai
Author_Institution
Department of Information Management, National Taiwan University of Science and Technology, Taipei, Taiwan, R.O.C.
fYear
2015
Firstpage
60
Lastpage
64
Abstract
Since most of web pages contain their main information in data records, extracting data records enables one to obtain and integrate data from diverse sources of Internet. Therefore, data extraction of web pages has been a popular research issue in the last decade. The paper aims to automatically extract data records from web pages and identify items from those extracted records. The proposed approach utilizes Data Path Matching to effectively extract data records and Data Path Code Alignment to efficiently identify data items. Experimental results reveal that the method can extract data effectively.
Keywords
"Data mining","Web pages","HTML","Visualization","Information filters","Yttrium"
Publisher
ieee
Conference_Titel
Digital Information Processing and Communications (ICDIPC), 2015 Fifth International Conference on
Type
conf
DOI
10.1109/ICDIPC.2015.7323006
Filename
7323006
Link To Document