Title :
Mining Polymorphic SSRs from Individual Genome Sequences
Author :
Yu-Lun Lu ; Chien-Ming Chen ; Tun-Wen Pai ; Hao-Teng Chang
Author_Institution :
Dept. of Comput. Sci. & Eng., Nat. Taiwan Ocean Univ., Keelung, Taiwan
Abstract :
Simple Sequence Repeats (SSRs) are abundant in genome sequences and become popular biomarkers for genetic studies. Several SSRs were proved essential for gene regulation, abnormal repeat patterns of these critical SSRs might cause lethal diseases. The Next Generation Sequencing technologies provided efficient approaches for SSR polymorphism detection. However, inefficient and manually curated processes were unavoidable for identifying SSR markers in previous approaches. An automatic and efficient system for detecting polymorphic SSRs at genomic scales was proposed without manual curated and examining works. The workflow accepted multiple NGS sequencing datasets and started with assembly by de novo or reference mapping approaches. The consensus sequences were then obtained from previously assembled contigs, and calibrated coordinates in each individual contig were aligned according to the selected reference sequences. Next, the mining SSR mechanism was designed to retrieve all potential polymorphic SSRs whenever the circumstances were occurred due to insertion or deletion mechanisms. The 1000 genomes Trio projects were employed as the testing sequence datasets, and the CODIS SSR markers and 9 well known disease-related SSR motifs were verified as the testing targets. The results have shown the proposed method could identify the known polymorphic SSRs as well as novel SSR markers when there was no sequencing or mapping errors within the consensus sequences. The proposed method employed NGS technologies to identify SSR polymorphism and accelerate related researches, which facilitates novel SSR biomarker selection and regulatory elements discovery.
Keywords :
calibration; data mining; diseases; genetics; genomics; medical computing; polymorphism; SSR markers; SSR polymorphism detection; biomarkers; calibrated coordinates; consensus sequences; gene regulation; genetic; genome sequences; genomic scales; lethal diseases; manual curated processes; mining polymorphic SSR mechanism; multiple NGS sequencing datasets; next generation sequencing technologies; reference mapping approaches; selected reference sequences; simple sequence repeats; Assembly; Bioinformatics; Diseases; Genomics; Sequential analysis; Testing; 1000 genomes project; CODIS; Simple Sequence Repeat; genetic disease; genetic marker; next generation sequencing (NGS);
Conference_Titel :
Complex, Intelligent, and Software Intensive Systems (CISIS), 2013 Seventh International Conference on
Conference_Location :
Taichung
Print_ISBN :
978-0-7695-4992-7
DOI :
10.1109/CISIS.2013.103