Title :
An efficient mining algorithm for key segment from DNA sequences
Author_Institution :
Inf. Sch., Central Univ. of Finance & Econ., Beijing, China
Abstract :
Unlike transaction sequences in business, DNA sequences typically have a small alphabet and a long length, and so mining DNA sequences faces different challenges from other applications. This paper deals with the problem of mining key segments from long DNA sequences. We design a compact data structure, called Association Matrix, to maintain in memory the statistical information from scanning DNA sequences. Based on the Association Matrix structure, we present an algorithm for mining key segments from a super long DNA sequence. We also evaluate the approach on synthetic and real life data sets, and its good performances in time and space are approved by the experiments.
Keywords :
DNA; biology computing; data mining; statistical analysis; DNA sequences; association matrix; data structure; key segment; mining algorithm; real life data sets; statistical information; synthetic data sets; Algorithm design and analysis; Bioinformatics; DNA; Data mining; Databases; Knowledge discovery;
Conference_Titel :
Electrical and Computer Engineering (CCECE), 2015 IEEE 28th Canadian Conference on
Conference_Location :
Halifax, NS
Print_ISBN :
978-1-4799-5827-6
DOI :
10.1109/CCECE.2015.7129310