DocumentCode :
3225043
Title :
Common substring in multiple sequences using hash based technique
Author :
Dheenadayalan, Kumar ; Muralidhara, V.N. ; Katru, Jayakrishna
Author_Institution :
Int. Inst. of Inf. Technol., Bangalore, India
fYear :
2013
fDate :
23-26 June 2013
Firstpage :
140
Lastpage :
145
Abstract :
Searching for the longest common substring in multiple sequences is of great practical application in the field of Bioinformatics. Two memory efficient solutions to the problem of finding common substrings in multiple sequences are proposed in this paper. First algorithm is a combination of hashing technique and Suffix Tree to find common substrings in long DNA or Protein sequences. This algorithm is three times more memory efficient when compared to other alternate data structures. k-Truncated Suffix Tree, a variation of Suffix Tree was proposed recently to find common substrings for short sequences. The second algorithm uses hashing with separate chaining for short sequences which offers a memory advantage of around 10 times when compared to k-truncated Suffix Tree. These algorithms also offer a great potential for parallelization of the search process which can improve the run time of the search by a large factor.
Keywords :
DNA; bioinformatics; molecular biophysics; proteins; string matching; tree data structures; tree searching; bioinformatics; data structures; hash-based technique; long DNA sequences; longest common substring search; multiple sequences; protein sequences; search process parallelization; short sequences; truncated suffix tree; Bioinformatics; Genomics; Irrigation; bioinformatics; hashing; k-truncated suffix tree; longest common substring; suffix tree;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Technology, Informatics, Management, Engineering, and Environment (TIME-E), 2013 International Conference on
Conference_Location :
Bandung
Print_ISBN :
978-1-4673-5730-2
Type :
conf
DOI :
10.1109/TIME-E.2013.6611980
Filename :
6611980
Link To Document :
بازگشت