DocumentCode
3089508
Title
EST Clustering in Large Dataset with MapReduce
Author
Wang, Chunyu ; Guo, Maozu ; Liu, Yang
Author_Institution
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
fYear
2010
fDate
17-19 Sept. 2010
Firstpage
968
Lastpage
971
Abstract
Analysis about EST data usually starts with EST clustering, the process of grouping fragments according their original consensus long sequence. The similarity between ESTs always means that part of the sequences match with each other in some way. Accurate clustering is quadratic in time in average EST length and numbers, and the number of ESTs in public EST database is increasing exponentially. With the help of cloud computing, we provide an k-mer based MapReduce algorithm for EST clustering in large dataset on commodity computers, and implement the algorithm in mrClust package. The result shows it is scalable and efficient for large EST dataset.
Keywords
Internet; database management systems; pattern clustering; EST clustering; cloud computing; k-mer based MapReduce algorithm; public EST database; Bioinformatics; Cloud computing; Clouds; Clustering algorithms; Databases; Genomics; Bioinformatics; Cloud Computing; Clustering; EST; MapReduce;
fLanguage
English
Publisher
ieee
Conference_Titel
Pervasive Computing Signal Processing and Applications (PCSPA), 2010 First International Conference on
Conference_Location
Harbin
Print_ISBN
978-1-4244-8043-2
Electronic_ISBN
978-0-7695-4180-8
Type
conf
DOI
10.1109/PCSPA.2010.239
Filename
5635947
Link To Document