مرکز منطقه ای اطلاع رساني علوم و فناوري - Common substring in multiple sequences using hash based technique

DocumentCode :

3225043

Title :

Common substring in multiple sequences using hash based technique

Author :

Dheenadayalan, Kumar ; Muralidhara, V.N. ; Katru, Jayakrishna

Author_Institution :

Int. Inst. of Inf. Technol., Bangalore, India

fYear :

2013

fDate :

23-26 June 2013

Firstpage :

140

Lastpage :

145

Abstract :

Searching for the longest common substring in multiple sequences is of great practical application in the field of Bioinformatics. Two memory efficient solutions to the problem of finding common substrings in multiple sequences are proposed in this paper. First algorithm is a combination of hashing technique and Suffix Tree to find common substrings in long DNA or Protein sequences. This algorithm is three times more memory efficient when compared to other alternate data structures. k-Truncated Suffix Tree, a variation of Suffix Tree was proposed recently to find common substrings for short sequences. The second algorithm uses hashing with separate chaining for short sequences which offers a memory advantage of around 10 times when compared to k-truncated Suffix Tree. These algorithms also offer a great potential for parallelization of the search process which can improve the run time of the search by a large factor.

Keywords :

DNA; bioinformatics; molecular biophysics; proteins; string matching; tree data structures; tree searching; bioinformatics; data structures; hash-based technique; long DNA sequences; longest common substring search; multiple sequences; protein sequences; search process parallelization; short sequences; truncated suffix tree; Bioinformatics; Genomics; Irrigation; bioinformatics; hashing; k-truncated suffix tree; longest common substring; suffix tree;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Technology, Informatics, Management, Engineering, and Environment (TIME-E), 2013 International Conference on

Conference_Location :

Bandung

Print_ISBN :

978-1-4673-5730-2

Type :

conf

DOI :

10.1109/TIME-E.2013.6611980

Filename :

6611980

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3225043