DocumentCode
2766573
Title
Automatic Template Detection for Structured Web Pages
Author
Lawrence Lo ; Ng, Vincent To-Yee ; Ng, Patrick ; Chan, Stephen C F
Author_Institution
Dept. of Comput., Hong Kong Polytech. Univ., Hong Kong
fYear
2006
fDate
3-5 May 2006
Firstpage
1
Lastpage
6
Abstract
Similar Web pages of Web sites on the World Wide Web are usually encoded from an underlying structured source, and generated dynamically from a pre-defined template, such as books´ information pages in Amazon.com. By giving a set of Web pages from a common Website, it is possible to extract the template by analyzing common patterns between the Web pages. In our work, we developed the CF-EXALG (collaborative finer-EXALG), based on EXALG, to decompose Web pages and finding their common structures. In our system, templates that are used to generate Web pages can be discovered automatically and stored in XML format. Hence, data encoded in Web pages can be easily extracted and the template can be stored for future manipulation. In our preliminary experiments, CF-EXALG has shown to be more accurate and efficient when compared with other similar systems
Keywords
Internet; XML; information retrieval; Web site; World Wide Web; XML; automatic template detection; collaborative finer-EXALG; structured Web page; template extraction; Books; Collaborative work; Data mining; Databases; Information retrieval; Keyword search; Pattern recognition; Web pages; Web sites; XML; Collaborative system; XML; webpage template construction;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Supported Cooperative Work in Design, 2006. CSCWD '06. 10th International Conference on
Conference_Location
Nanjing
Print_ISBN
1-4244-0164-X
Electronic_ISBN
1-4244-0165-8
Type
conf
DOI
10.1109/CSCWD.2006.253257
Filename
4019293
Link To Document