• DocumentCode
    589962
  • Title

    Driving big data with big compute

  • Author

    Chansup Byun ; Arcand, William ; Bestor, David ; Bergeron, Bill ; Hubbell, Matthew ; Kepner, Jeremy ; McCabe, A. ; Michaleas, Peter ; Mullen, Jon ; O´Gwynn, David ; Prout, Andrew ; Reuther, A. ; Rosa, Alberto ; Yee, Charles

  • Author_Institution
    MIT Lincoln Lab., Lexington, MA, USA
  • fYear
    2012
  • fDate
    10-12 Sept. 2012
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Big Data (as embodied by Hadoop clusters) and Big Compute (as embodied by MPI clusters) provide unique capabilities for storing and processing large volumes of data. Hadoop clusters make distributed computing readily accessible to the Java community and MPI clusters provide high parallel efficiency for compute intensive workloads. Bringing the big data and big compute communities together is an active area of research. The LLGrid team has developed and deployed a number of technologies that aim to provide the best of both worlds. LLGrid MapReduce allows the map/reduce parallel programming model to be used quickly and efficiently in any language on any compute cluster. D4M (Dynamic Distributed Dimensional Data Model) provided a high level distributed arrays interface to the Apache Accumulo database. The accessibility of these technologies is assessed by measuring the effort to use these tools and is typically a few lines of code. The performance is assessed by measuring the insert rate into the Accumulo database. Using these tools a database insert rate of 4M inserts/second has been achieved on an 8 node cluster.
  • Keywords
    Java; application program interfaces; data handling; data models; database management systems; message passing; parallel programming; Apache Accumulo database; D4M; Hadoop clusters; Java community; LLGrid MapReduce; MPI clusters; big compute; big data; distributed computing; dynamic distributed dimensional data model; high level distributed arrays interface; map-reduce parallel programming model; node cluster; Arrays; Databases; History; Java; MATLAB; Mathematical model; Servers; LLGridMapReduce; concurrent query; d4m; hdfs; parallel ingestion; parallel matlab; scheduler;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Extreme Computing (HPEC), 2012 IEEE Conference on
  • Conference_Location
    Waltham, MA
  • Print_ISBN
    978-1-4673-1577-7
  • Type

    conf

  • DOI
    10.1109/HPEC.2012.6408678
  • Filename
    6408678