• DocumentCode
    650636
  • Title

    An Evaluation of Cassandra for Hadoop

  • Author

    Dede, E. ; Sendir, B. ; Kuzlu, P. ; Hartog, J. ; Govindaraju, M.

  • Author_Institution
    Grid & Cloud Comput. Res. Lab., SUNY Binghamton, Binghamton, NY, USA
  • fYear
    2013
  • fDate
    June 28 2013-July 3 2013
  • Firstpage
    494
  • Lastpage
    501
  • Abstract
    In the last decade, the increased use and growth of social media, unconventional web technologies, and mobile applications, have all encouraged development of a new breed of database models. NoSQL data stores target the unstructured data, which by nature is dynamic and a key focus area for "Big Data" research. New generation data can prove costly and unpractical to administer with SQL databases due to lack of structure, high scalability, and elasticity needs. NoSQL data stores such as MongoDB and Cassandra provide a desirable platform for fast and efficient data queries. This leads to increased importance in areas such as cloud applications, e-commerce, social media, bioinformatics, and materials science. In an effort to combine the querying capabilities of conventional database systems and the processing power of the MapReduce model, this paper presents a thorough evaluation of the Cassandra NoSQL database when used in conjunction with the Hadoop MapReduce engine. We characterize the performance for a wide range of representative use cases, and then compare, contrast, and evaluate so that application developers can make informed decisions based upon data size, cluster size, replication factor, and partitioning strategy to meet their performance needs.
  • Keywords
    SQL; distributed databases; pattern clustering; public domain software; relational databases; Big Data research; Cassandra evaluation; Hadoop MapReduce engine; MapReduce model; MongoDB; NoSQL database; Web technologies; cluster size; data querying; data size; database models; mobile applications; partitioning strategy; performance needs; replication factor; representative use cases; social media; Benchmark testing; Data models; Distributed databases; Peer-to-peer computing; Servers; Writing; Cassandra; Distributed Computing; Hadoop; MapReduce; NoSQL;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on
  • Conference_Location
    Santa Clara, CA
  • Print_ISBN
    978-0-7695-5028-2
  • Type

    conf

  • DOI
    10.1109/CLOUD.2013.31
  • Filename
    6676732