Title :
Short adjacent repeat identification based on Chemical Reaction Optimization
Author :
Xu, Jin ; Lam, Albert Y S ; Li, Victor O K ; Li, Qiwei ; Fan, Xiaodan
Author_Institution :
Dept. of Electr. & Electron. Eng., Univ. of Hong Kong, Hong Kong, China
Abstract :
The analysis of short tandem repeats (STRs) in DNA sequences has become an attractive method for determining the genetic profile of an individual. Here we focus on a more general and practical issue named short adjacent repeats identification problem (SARIP), which is extended from STR by allowing short gaps between neighboring units. Presently, the best available solution to SARIP is BASARD, which uses Markov chain Monte Carlo algorithms to determine the posterior estimate. However, the computational complexity and the tendency to get stuck in a local mode lower the efficiency of BASARD and impede its wide application. In this paper, we prove that SARIP is NP-hard, and we also solve it with Chemical Reaction Optimization (CRO), a recently developed metaheuristic approach. CRO mimics the interactions of molecules in a chemical reaction and it can explore the solution space efficiently to find the optimal or near optimal solution(s). We test the CRO algorithm with both synthetic and real data, and compare its performance in mode searching with BASARD. Simulation results show that CRO enjoys dozens of times, or even a hundred times shorter computational time compared with BASARD. It is also demonstrated that CRO can obtain the global optima most of the time. Moreover, CRO is more stable in different runs, which is of great importance in practical use. Thus, CRO is by far the best method on SARIP.
Keywords :
Bayes methods; DNA; Markov processes; Monte Carlo methods; biology computing; computational complexity; estimation theory; genetics; optimisation; BASARD; CRO algorithm; DNA sequences; Markov chain Monte Carlo algorithms; NP-hard; SARIP; STR; adjacent repeat identification; attractive method; chemical reaction optimization; computational complexity; genetic profile; hundred times shorter computational time; metaheuristic approach; mode searching; near optimal solution; posterior estimate; real data; short adjacent repeats identification problem; short tandem repeats; solution space; synthetic data; Chemicals; DNA; Optimization; Polynomials; Silicon; Tin; Vectors; Chemical Reaction Optimization; Short adjacent repeats; maximum a posteriori;
Conference_Titel :
Evolutionary Computation (CEC), 2012 IEEE Congress on
Conference_Location :
Brisbane, QLD
Print_ISBN :
978-1-4673-1510-4
Electronic_ISBN :
978-1-4673-1508-1
DOI :
10.1109/CEC.2012.6256614