DocumentCode
2079625
Title
Automatic wrapper generation for semi-structures biological data based on table structure identification
Author
Chen, Liangyou ; Jamil, Hasan M. ; Wang, Nan
Author_Institution
Mississippi State Univ., USA
fYear
2003
fDate
1-5 Sept. 2003
Firstpage
55
Lastpage
59
Abstract
Biological data analyses usually require complex manipulations involving tool applications, multiple Web site navigation, result selection and filtering, iteration over the Internet. Most biological data are generated from structured databases and by applications and presented to the users embedded within repeated structures, or tables, in HTML documents. In this paper we outline a novel technique for the identification of table structures in HTML documents. This identification technique is then used to automatically generate composite wrappers for applications requiring distributed resources. We demonstrate that our method is robust enough to discover standard as well as non-standard table structures in HTML documents. Thus, our technique outperforms contemporary techniques used in systems such as XWrap and AutoWrapper. We discuss our technique in the context of our PickUp system that exploits the theoretical developments presented in this paper and emerges as an elegant automatic wrapper generation system.
Keywords
biology computing; data analysis; data structures; distributed processing; hypermedia markup languages; query processing; AutoWrapper; HTML documents; Internet; PickUp system; Web site navigation; XWrap; automatic wrapper generation; biological data based; composite wrappers; distributed resources; repeated structures; result selection; structured databases; table structure identification; tool applications; Application software; Automation; Bioinformatics; Cancer; Costs; Data analysis; Databases; Genomics; HTML; Induction generators;
fLanguage
English
Publisher
ieee
Conference_Titel
Database and Expert Systems Applications, 2003. Proceedings. 14th International Workshop on
ISSN
1529-4188
Print_ISBN
0-7695-1993-8
Type
conf
DOI
10.1109/DEXA.2003.1231998
Filename
1231998
Link To Document