• DocumentCode
    3089508
  • Title

    EST Clustering in Large Dataset with MapReduce

  • Author

    Wang, Chunyu ; Guo, Maozu ; Liu, Yang

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
  • fYear
    2010
  • fDate
    17-19 Sept. 2010
  • Firstpage
    968
  • Lastpage
    971
  • Abstract
    Analysis about EST data usually starts with EST clustering, the process of grouping fragments according their original consensus long sequence. The similarity between ESTs always means that part of the sequences match with each other in some way. Accurate clustering is quadratic in time in average EST length and numbers, and the number of ESTs in public EST database is increasing exponentially. With the help of cloud computing, we provide an k-mer based MapReduce algorithm for EST clustering in large dataset on commodity computers, and implement the algorithm in mrClust package. The result shows it is scalable and efficient for large EST dataset.
  • Keywords
    Internet; database management systems; pattern clustering; EST clustering; cloud computing; k-mer based MapReduce algorithm; public EST database; Bioinformatics; Cloud computing; Clouds; Clustering algorithms; Databases; Genomics; Bioinformatics; Cloud Computing; Clustering; EST; MapReduce;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pervasive Computing Signal Processing and Applications (PCSPA), 2010 First International Conference on
  • Conference_Location
    Harbin
  • Print_ISBN
    978-1-4244-8043-2
  • Electronic_ISBN
    978-0-7695-4180-8
  • Type

    conf

  • DOI
    10.1109/PCSPA.2010.239
  • Filename
    5635947