DocumentCode :
2417942
Title :
Parsimony-spaced suffix trees for DNA sequences
Author :
Chen, Yun-Ching ; Lee, Suh-Yin
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Chiao-Tung Univ., Hsinchu, Taiwan
fYear :
2003
fDate :
10-12 Dec. 2003
Firstpage :
250
Lastpage :
256
Abstract :
Bioinformatics has become an important research field because there is more genetic data to be analyzed. The suffix tree is a powerful data structure for string analysis and has many applications in bioinformatics. Its linear construction time, linear construction space and short search time all make it very impressive. However, consuming huge space is a fatal drawback especially while using suffix trees to handle the large number of DNA sequences. We utilize some characteristics of DNA sequences to reduce the space requirement of suffix trees. A new bit layout is proposed for the node of a suffix tree which requires less space than others. We also use an index table, called a "prefix table", which can reduce the number of internal nodes in suffix trees. In addition, we propose a preprocessing technique to improve the construction time based on our data structure. The experiments show that our proposed method is the most space-parsimony implementation of suffix trees for DNA sequences and it also has a good construction time.
Keywords :
DNA; biology computing; genetics; molecular biophysics; sequences; string matching; tree data structures; tree searching; DNA sequences; bioinformatics; data structure; prefix table; preprocessing technique; space-parsimony implementation; string analysis; suffix trees; Bioinformatics; Computer science; DNA; Data analysis; Data engineering; Genetic engineering; Genomics; Power engineering and energy; Sequences; Tree data structures;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia Software Engineering, 2003. Proceedings. Fifth International Symposium on
Conference_Location :
Taichung, Taiwan
Print_ISBN :
0-7695-2031-6
Type :
conf
DOI :
10.1109/MMSE.2003.1254449
Filename :
1254449
Link To Document :
بازگشت