Title :
Next-generation sequencing data processing: Analysis of unmapped reads and extremely high mapped peaks
Author :
Jiao Chen ; ZuoLei Dai ; Changchang Cao ; Qianqian Zhang ; Hongde Liu ; Xiao Sun
Author_Institution :
Sch. of Biol. Sci. & Med. Eng., SouthEast Univ., Nanjing, China
Abstract :
Next-generation sequencing (NGS) and its applications are widely used in studying gene regulation and epigenetic mechanisms due to its decreasing cost and high throughput. Here we used MNase-seq technology to determine the nucleosome positions in human erythroleukemia k562 cells by direct sequencing of nucleosome ends with the SOLiD high-throughput sequencing technique. However, during the reads mapping and data pre-analysis steps, only 40% of the sequenced reads can be mapped to the reference genome hg19 and there are some extremely high peaks (EHPs) in the profiles of mapped reads on the reference genome. Mathematical models were developed to analyze the unmapped reads and nearly 25.3% of the unmapped reads were found due to genome variants, base-calling errors and gaps of the reference genome. We also investigated EHPs and proposed methods to deal with the EHPs for the downstream data analysis.
Keywords :
bioinformatics; cellular biophysics; data analysis; enzymes; genetics; genomics; MNase-sequencing technology; NGS data processing; SOLiD high-throughput sequencing technique; data analysis; epigenetic mechanism; extremely high peak analysis; gene regulation; human erythroleukemia k562 cell; mathematical model; next-generation sequencing; nucleosome position; reference genome hg19; unmapped read analysis; EHPs; MNase-seq; SOLiD; sequence mapping;
Conference_Titel :
Biomedical Engineering and Informatics (BMEI), 2012 5th International Conference on
Conference_Location :
Chongqing
Print_ISBN :
978-1-4673-1183-0
DOI :
10.1109/BMEI.2012.6512933