• DocumentCode
    659488
  • Title

    Terabyte-scale image similarity search: Experience and best practice

  • Author

    Moise, Diana ; Shestakov, Denis ; Gudmundsson, Gylfi ; Amsaleg, Laurent

  • Author_Institution
    INRIA Rennes, Rennes, France
  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    674
  • Lastpage
    682
  • Abstract
    While the past decade has witnessed an unprecedented growth of data generated and collected all over the world, existing data management approaches lack the ability to address the challenges of Big Data. One of the most promising tools for Big Data processing is the MapReduce paradigm. Although it has its limitations, the MapReduce programming model has laid the foundations for answering some of the Big Data challenges. In this paper, we focus on Hadoop, the open-source implementation of the MapReduce paradigm. Using as case-study a Hadoop-based application, i.e., image similarity search, we present our experiences with the Hadoop framework when processing terabytes of data. The scale of the data and the application workload allowed us to test the limits of Hadoop and the efficiency of the tools it provides. We present a wide collection of experiments and the practical lessons we have drawn from our experience with the Hadoop environment. Our findings can be shared as best practices and recommendations to the Big Data researchers and practioners.
  • Keywords
    Big Data; multimedia systems; parallel processing; public domain software; Big Data processing; Hadoop; MapReduce paradigm open-source implementation; terabyte-scale image similarity search; terabytes data processing; Best practices; Data handling; Data storage systems; Indexing; Information management; Multimedia communication;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691637
  • Filename
    6691637