Title :
Chromatin signature analysis and prediction of genome-wide novel promoters using finite mixture model
Author :
Taslim, Cenny ; Lin, Shili ; Huang, Kun ; Huang, Tim
Author_Institution :
Dept. of Stat., Ohio State Univ., Columbus, OH, USA
Abstract :
Regulation of gene expression has been shown to involve not only binding of transcription factor in target gene promoters but also characterization of histone around which DNA is wrapped around. Some histone modification, for example di-methylated histone H3 at lysine 4 (H3K4me2), has been shown to be associated with gene activation. However, no clear pattern has been shown to predict human promoters. This paper proposed a novel quantitative approach to characterize chromatin signature and patterns of promoters, which are then used to predict novel (alternative) promoters. In this paper, chromatin immunoprecipitation methods followed by massive parallel sequencing (ChIP-seq) data against RNA Polymerase II (Pol II) and H3K4me2 are used to identify common patterns of promoter regions. These patterns were then used to search for similar patterns over the entire genome to find novel promoters. Common patterns of promoter regions are modeled using a mixture model involving double-exponential and uniform distributions. Regions with high correlations with the common patterns are identified as putative novel promoters. We used this proposed algorithm and RNA-seq data to identify novel promoters in the MCF7 cell line. We found 4,392 high-confidence regions that display the identified promoter patterns (referred to as putative novel promoters). Of these, 875 regions (20%) overlap with RNA transcripts. Around 70% of these putative novel promoters have overlapped with RNA transcripts, EST and/or non-coding RNA suggesting that these putative novel promoters might be promoters which are currently undiscovered.
Keywords :
DNA; RNA; biochemistry; cellular biophysics; genetics; genomics; molecular biophysics; physiological models; DNA; H3K4me2; MCF7 cell line; RNA Polymerase II; RNA transcripts; chromatin immunoprecipitation methods; chromatin signature analysis; di-methylated histone H3; double-exponential distributions; finite mixture model; gene activation; gene expression regulation; genome-wide novel promoter prediction; histone modification; lysine 4; noncoding RNA; transcription factor; uniform distributions; Bioinformatics; Correlation; Gene expression; Genomics; Humans; Polymers; RNA;
Conference_Titel :
Genomic Signal Processing and Statistics (GENSIPS), 2011 IEEE International Workshop on
Conference_Location :
San Antonio, TX
Print_ISBN :
978-1-4673-0491-7
Electronic_ISBN :
2150-3001
DOI :
10.1109/GENSiPS.2011.6169429