DocumentCode :
653958
Title :
Consensus Sigma-70 Promoter Prediction Using Hadoop
Author :
Hogan, James M. ; Kelly, Wayne A. ; Newell, Felicity S.
fYear :
2013
fDate :
22-25 Oct. 2013
Firstpage :
35
Lastpage :
44
Abstract :
MapReduce frameworks such as Hadoop are well suited to handling large sets of data which can be processed separately and independently, with canonical applications in information retrieval and sales record analysis. Rapid advances in sequencing technology have ensured an explosion in the availability of genomic data, with a consequent rise in the importance of large scale comparative genomics, often involving operations and data relationships which deviate from the classical Map Reduce structure. This work examines the application of Hadoop to patterns of this nature, using as our focus a well established workflow for identifying promoters - binding sites for regulatory proteins - across multiple gene regions and organisms, coupled with the unifying step of assembling these results into a consensus sequence. Our approach demonstrates the utility of Hadoop for problems of this nature, showing how the tyranny of the "dominant decomposition" can be at least partially overcome. It also demonstrates how load balance and the granularity of parallelism can be optimized by pre-processing that splits and reorganizes input files, allowing a wide range of related problems to be brought under the same computational umbrella.
Keywords :
biology computing; data handling; genomics; parallel programming; proteins; public domain software; Hadoop; MapReduce frameworks; binding sites; consensus Sigma-70 promoter prediction; data relationships; dominant decomposition; genomic data; information retrieval; large data set handling; large scale comparative genomics; multiple gene regions; multiple organisms; parallelism granularity; promoter identification; regulatory proteins; sales record analysis; sequencing technology; Bioinformatics; Context; DNA; Genomics; Java; Organisms; Proteins; Bioinformatics; Hadoop; Map Reduce; Promoter Prediction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
eScience (eScience), 2013 IEEE 9th International Conference on
Conference_Location :
Beijing
Type :
conf
DOI :
10.1109/eScience.2013.42
Filename :
6683889
Link To Document :
بازگشت