• DocumentCode
    781995
  • Title

    Automated Structure Extraction and XML Conversion of Life Science Database Flat Files

  • Author

    Philippi, Stephan ; Köhler, Jacob

  • Author_Institution
    Univ. of Koblenz
  • Volume
    10
  • Issue
    4
  • fYear
    2006
  • Firstpage
    714
  • Lastpage
    721
  • Abstract
    In the light of the increasing number of biological databases, their integration is a fundamental prerequisite for answering complex biological questions. Database integration, therefore, is an important area of research in bioinformatics. Since most of the publicly available life science databases are still exclusively exchanged by means of proprietary flat files, database integration requires parsers for very different flat file formats. Unfortunately, the development and maintenance of database specific flat file parsers is a nontrivial and time-consuming task, which takes considerable effort in large-scale integration scenarios. This paper introduces heuristically based concepts for automatic structure extraction from life science database flat files. On the basis of these concepts the FlatEx prototype is developed for the automatic conversion of flat files into XML representations
  • Keywords
    XML; biology computing; data structures; database management systems; electronic data interchange; scientific information systems; FlatEx prototype; XML conversion; automated structure extraction; automatic conversion; bioinformatics; biological database integration; data exchange; data transformation; database specific flat file parsers; life science database flat files; Bioinformatics; Biology; Data mining; Data structures; Jacobian matrices; Large scale integration; Light scattering; Prototypes; Spatial databases; XML; Data exchange; data integration; data transformation; database flat files; structure extraction;
  • fLanguage
    English
  • Journal_Title
    Information Technology in Biomedicine, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1089-7771
  • Type

    jour

  • DOI
    10.1109/TITB.2006.875653
  • Filename
    1707684