DocumentCode :
3188841
Title :
FiVaTech: Page-Level Web Data Extraction from Template Pages
Author :
Kayed, Mohammed ; Chang, Chia-Hui ; Shaalan, Khaled ; Girgis, Moheb Ramzy
Author_Institution :
Nat. Central Univ., Taipei
fYear :
2007
fDate :
28-31 Oct. 2007
Firstpage :
15
Lastpage :
20
Abstract :
In this paper, we proposed a new approach, called FiVaTech for the problem of Web data extraction. FiVaTech is a page-level data extraction system which deduces the data schema and templates for the input pages generated from a CGI program. FiVaTech uses tree templates to model the generation of dynamic Web pages. FiVaTech can deduce the schema and templates for each individual Deep Web site, which contains either singleton or multiple data records in one Web page. FiVaTech applies tree matching, tree alignment, and mining techniques to achieve the challenging task. The experiments show an encouraging result for the test pages used in many state-of-the-art Web data extraction works.
Keywords :
Internet; data mining; pattern matching; tree data structures; CGI program; page-level Web data extraction; template page; tree alignment; tree matching; tree mining; Computer science; Conferences; Data engineering; Data mining; Data visualization; Databases; Informatics; Testing; Web pages; Wrapping;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on
Conference_Location :
Omaha, NE
Print_ISBN :
978-0-7695-3019-2
Electronic_ISBN :
978-0-7695-3033-8
Type :
conf
DOI :
10.1109/ICDMW.2007.95
Filename :
4476640
Link To Document :
بازگشت