DocumentCode :
3147633
Title :
Error Correction and Clustering Algorithms for Next Generation Sequencing
Author :
Yang, Xiao
Author_Institution :
Dept. of Electr. & Comput. Eng., Iowa State Univ., Ames, IA, USA
fYear :
2011
fDate :
16-20 May 2011
Firstpage :
2101
Lastpage :
2104
Abstract :
Next generation sequencing (NGS) revolutionized genomic data generation by enabling high-throughput parallel sequencing, making large scale genomic data analysis a crucial task. To improve NGS data quality, we developed an efficient algorithm that uses a flexible read decomposition method to improve accuracy of error correction. We further proposed a statistical framework to differentiate infrequently observed sub reads from sequencing errors in the prevalence of genomic repeats. To enable the analysis of microbial organism composition in environmental samples, we developed a parallel solution for metagenomic sequence clustering integrating sketching, quasi-clique enumeration and MapReduce techniques.
Keywords :
bioinformatics; cloud computing; data analysis; error correction; genomics; pattern clustering; sequences; MapReduce techniques; NGS data quality; error correction; flexible read decomposition method; genomic data analysis; genomic data generation; high-throughput parallel sequencing; metagenomic sequence clustering; microbial organism composition; next generation sequencing; quasiclique enumeration; Bioinformatics; Clustering algorithms; Error correction; Genomics; High definition video; Memory management; Tiles;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
Conference_Location :
Shanghai
ISSN :
1530-2075
Print_ISBN :
978-1-61284-425-1
Electronic_ISBN :
1530-2075
Type :
conf
DOI :
10.1109/IPDPS.2011.387
Filename :
6009098
Link To Document :
بازگشت