• DocumentCode
    2079625
  • Title

    Automatic wrapper generation for semi-structures biological data based on table structure identification

  • Author

    Chen, Liangyou ; Jamil, Hasan M. ; Wang, Nan

  • Author_Institution
    Mississippi State Univ., USA
  • fYear
    2003
  • fDate
    1-5 Sept. 2003
  • Firstpage
    55
  • Lastpage
    59
  • Abstract
    Biological data analyses usually require complex manipulations involving tool applications, multiple Web site navigation, result selection and filtering, iteration over the Internet. Most biological data are generated from structured databases and by applications and presented to the users embedded within repeated structures, or tables, in HTML documents. In this paper we outline a novel technique for the identification of table structures in HTML documents. This identification technique is then used to automatically generate composite wrappers for applications requiring distributed resources. We demonstrate that our method is robust enough to discover standard as well as non-standard table structures in HTML documents. Thus, our technique outperforms contemporary techniques used in systems such as XWrap and AutoWrapper. We discuss our technique in the context of our PickUp system that exploits the theoretical developments presented in this paper and emerges as an elegant automatic wrapper generation system.
  • Keywords
    biology computing; data analysis; data structures; distributed processing; hypermedia markup languages; query processing; AutoWrapper; HTML documents; Internet; PickUp system; Web site navigation; XWrap; automatic wrapper generation; biological data based; composite wrappers; distributed resources; repeated structures; result selection; structured databases; table structure identification; tool applications; Application software; Automation; Bioinformatics; Cancer; Costs; Data analysis; Databases; Genomics; HTML; Induction generators;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Applications, 2003. Proceedings. 14th International Workshop on
  • ISSN
    1529-4188
  • Print_ISBN
    0-7695-1993-8
  • Type

    conf

  • DOI
    10.1109/DEXA.2003.1231998
  • Filename
    1231998