DocumentCode
2860336
Title
Tree-Structured Template Generation for Web Pages
Author
Chuang, Shui-Lung ; Hsu, Jane Yung-jen
Author_Institution
Academia Sinica, Taiwan
fYear
2004
fDate
20-24 Sept. 2004
Firstpage
327
Lastpage
333
Abstract
As the web becomes an increasingly important source of information, tools for modeling, searching, and extracting information from Web pages are indispensable. By modeling the structure of a Web page defined by its markup tags, one can easily extract target information using structural templates. This paper introduces the Tree Template Automatic Generator (TTAG) that learns tree-structured templates from training Web pages. TTAG was applied to both query-based and frequently updated Web sites, and produced effective templates from a small number of examples. The experiments show that TTAG is a powerful extraction tool for semi-structured information sources.
Keywords
Automata; Data mining; Databases; HTML; Information resources; Information science; Internet; Power generation; Seminars; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
Print_ISBN
0-7695-2100-2
Type
conf
DOI
10.1109/WI.2004.10101
Filename
1410822
Link To Document