EST Clustering in Large Dataset with MapReduce

Author

Wang, Chunyu ; Guo, Maozu ; Liu, Yang

Author_Institution

Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China

fYear

2010

fDate

17-19 Sept. 2010

Firstpage

968

Lastpage

971

Abstract

Analysis about EST data usually starts with EST clustering, the process of grouping fragments according their original consensus long sequence. The similarity between ESTs always means that part of the sequences match with each other in some way. Accurate clustering is quadratic in time in average EST length and numbers, and the number of ESTs in public EST database is increasing exponentially. With the help of cloud computing, we provide an k-mer based MapReduce algorithm for EST clustering in large dataset on commodity computers, and implement the algorithm in mrClust package. The result shows it is scalable and efficient for large EST dataset.

Keywords

Internet; database management systems; pattern clustering; EST clustering; cloud computing; k-mer based MapReduce algorithm; public EST database; Bioinformatics; Cloud computing; Clouds; Clustering algorithms; Databases; Genomics; Bioinformatics; Cloud Computing; Clustering; EST; MapReduce;

fLanguage

English

Publisher

ieee

Conference_Titel

Pervasive Computing Signal Processing and Applications (PCSPA), 2010 First International Conference on

Conference_Location

Harbin

Print_ISBN

978-1-4244-8043-2

Electronic_ISBN

978-0-7695-4180-8

Type

conf

DOI

10.1109/PCSPA.2010.239

Filename

5635947

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3089508