DocumentCode :
2081204
Title :
Mining mutation chains in biological sequences
Author :
Sheng, Chang ; Hsu, Wynne ; Mong Li Lee ; Tong, Joo Chuan ; Ng, See-Kiong
Author_Institution :
Nat. Univ. of Singapore, Singapore, Singapore
fYear :
2010
fDate :
1-6 March 2010
Firstpage :
473
Lastpage :
484
Abstract :
The increasing infectious disease outbreaks has led to a need for new research to better understand the disease´s origins, epidemiological features and pathogenicity caused by fast-mutating, fast-spreading viruses. Traditional sequence analysis methods do not take into account the spatio-temporal dynamics of rapidly evolving and spreading viral species. They are also focused on identifying single-point mutations. In this paper, we propose a novel approach that incorporates space-time relationships for studying changes in protein sequences from fast mutating viruses. We aim to detect both single-point mutations as well as k-mutations in the viral sequences. We define the problem of mutation chain pattern mining and design algorithms to discover valid mutation chains. Compact data structures to facilitate the mining process as well as pruning strategies to increase the scalability of the algorithms are devised. Experiments on both synthetic datasets and real world influenza A virus dataset show that our algorithms are scalable and effective in discovering mutations that occur geographically over time.
Keywords :
biology computing; data mining; data structures; biological sequences; chain pattern mining; data structures; epidemiological features; mining mutation chains; single point mutations; space time relationships; spatio temporal dynamics; Amino acids; Diseases; Genetic mutations; Humans; Immune system; Influenza; Pathogens; Proteins; Vaccines; Viruses (medical);
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2010 IEEE 26th International Conference on
Conference_Location :
Long Beach, CA
Print_ISBN :
978-1-4244-5445-7
Electronic_ISBN :
978-1-4244-5444-0
Type :
conf
DOI :
10.1109/ICDE.2010.5447869
Filename :
5447869
Link To Document :
بازگشت