• DocumentCode
    2771082
  • Title

    A Tool for Supporting Integration Across Multiple Flat-File Datasets

  • Author

    Zhang, Xuan ; Agrawal, Gagan

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH
  • fYear
    2006
  • fDate
    16-18 Oct. 2006
  • Firstpage
    141
  • Lastpage
    148
  • Abstract
    Traditionally, biologists focused on a single research subject. New high-throughput experimental and analytical technologies, such as microarray and BLAST programs, have changed this. An important functionality required now is the ability to process queries about multiple data entries with little user intervention. This paper presents the design, implementation, and evaluation of a data integration tool that supports database-like query operations across flat-file biological datasets. Compared with the existing solutions, our system has several advantages, i.e., no database management system is required, users can still use declarative languages to communicate with the system, and no data parsing, loading, or indexing utility programs need to be written. We have used the system on three biological queries, each of which was inspired by an actual study from bioinformatics research literature. These case studies have demonstrated the functionality and scalability of our tool. Overall, our approach provides a light-weight and scalable solution for data integration over flat-file datasets
  • Keywords
    biology computing; data integrity; query processing; BLAST program; bioinformatics research literature; biological queries; data integration tool; database-like query operation; microarray; multiple data entries; multiple flat-file biological datasets; queries processing; Bioinformatics; Biology; Computer science; Data engineering; Database systems; Humans; Indexing; Scalability; Utility programs; Web services;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    BioInformatics and BioEngineering, 2006. BIBE 2006. Sixth IEEE Symposium on
  • Conference_Location
    Arlington, VA
  • Print_ISBN
    0-7695-2727-2
  • Type

    conf

  • DOI
    10.1109/BIBE.2006.253327
  • Filename
    4019652