• DocumentCode
    464284
  • Title

    An ACGT-Words Tree for Efficient Data Access in Genomic Databases

  • Author

    Chang, Ye-In ; Yeh, Wei-Horng ; Chen, Jiun-Rung ; Hu, Jen-Wei

  • Author_Institution
    Dept. of Comput. Sci. & Eng., National Sun Yat-Sen Univ., Kaohsiung
  • fYear
    2007
  • fDate
    1-5 April 2007
  • Firstpage
    143
  • Lastpage
    150
  • Abstract
    Genomic sequence databases, like GenBank, EMBL, are widely used by molecular biologists for homology searching. Because of the increase of the size of genomic sequence databases, the importance of indexing the sequences for fast queries grows. In this paper, we propose a new index structure, ACGT-Words tree, for efficiently support query processing in genomic databases. We define the concept of words which is different from the word definition given in the word suffix tree, and separate the DNA sequences stored in the database and in the query sequence into distinct words. Our approach does not store all of the suffixes in the database sequences. Therefore, we need less space than the suffix tree approach. We also propose an efficient search algorithm to do the sequence match based on the ACGT-Words tree index structure. Therefore, we could take less time to finish the search than the suffix array approach. Moreover, our approach avoids the missing cases occurring in the word suffix tree. The simulation results show that our ACGT-Words tree outperforms the suffix tree and the suffix array in terms of storage and processing time, respectively
  • Keywords
    biology computing; genetics; indexing; query processing; tree data structures; ACGT-words tree; DNA sequences; GenBank; efficient data access; genomic sequence database; homology searching; indexing; query processing; word suffix tree; Bioinformatics; Computational biology; Computational intelligence; DNA; Data structures; Databases; Genomics; Indexing; Sequences; Tree data structures;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Bioinformatics and Computational Biology, 2007. CIBCB '07. IEEE Symposium on
  • Conference_Location
    Honolulu, HI
  • Print_ISBN
    1-4244-0710-9
  • Type

    conf

  • DOI
    10.1109/CIBCB.2007.4221216
  • Filename
    4221216