DocumentCode
116675
Title
A fast sorting algorithm for aptamer identification using deep sequencing
Author
Yiou Xiao ; Mehrotra, Kishan G. ; Allis, Damian G. ; Borer, Phillip N.
Author_Institution
Dept. of EECS, Syracuse Univ., Syracuse, NY, USA
fYear
2014
fDate
17-20 Aug. 2014
Firstpage
759
Lastpage
763
Abstract
In recent years, with the advent of fast sequencing technology, the genomic database is growing rapidly. Researchers in the bioinformatics field are expecting faster and more accurate tools to effectively analyze the gigantic data sets. In the context of aptamer search, the goal is to search for the over-represented DNA sequences from the randomly generated aptamer libraries. Hash functions are widely used in substring comparison, sequence alignment and clustering tools. We have developed a light-weight tool that takes advantage of the hash functions to reduce the size of genomic data and conducts η-neighbor searches on the centroid sequence. This greatly improves the efficiency of the search compared with existing tools. Furthermore, the prior calculation of hash values of η-neighbors decreases the searching overhead. In a dataset of 2.23 million sequences, the proposed algorithm accurately count the frequency of the Human α-Thrombin aptamer sequences in less than 40 seconds, whereas the current script-based method takes 2 hours and 18 minutes.
Keywords
DNA; bioinformatics; sequences; sorting; η-neighbor searches; DNA sequences; aptamer identification; bioinformatics; fast deep sequencing technology; fast sorting algorithm; genomic database; gigantic data sets; human α-Thrombin aptamer sequences; Algorithm design and analysis; Conferences; DNA; Hamming distance; Sequential analysis; Social network services; Software algorithms;
fLanguage
English
Publisher
ieee
Conference_Titel
Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on
Conference_Location
Beijing
Type
conf
DOI
10.1109/ASONAM.2014.6921671
Filename
6921671
Link To Document