• DocumentCode
    632882
  • Title

    Using Hadoop MapReduce in a multicluster environment

  • Author

    Tomasic, Ivan ; Rashkovska, Aleksandra ; Depolli, M.

  • Author_Institution
    Dept. of Commun. Syst., Jozef Stefan Inst., Ljubljana, Slovenia
  • fYear
    2013
  • fDate
    20-24 May 2013
  • Firstpage
    345
  • Lastpage
    350
  • Abstract
    Hadoop MapReduce has become one of the most popular tools for data processing. Hadoop is normally installed on a cluster of computers. When the cluster becomes undersized, it can be scaled by adding new computers and storage devices, but it can also be extended by real or virtual resources from another computer cluster. We present a utilization of the MapReduce paradigm on a Hadoop installation extended across two clusters connected over the Internet. We measured execution times of Map and Reduce tasks in a multicluster environment, and compared them to the corresponding times obtained while only computers from a single cluster are used. The results show that there might be a decrease in MapReduce performance depending on: the concrete data analyses application, the ratio of the number of local and remote computers, and connection bandwidth to remote computers. Additionally, the investigation suggests an upgrade to the Apache Hadoop MapReduce, making it more adjusted to the multicluster environment.
  • Keywords
    Internet; data analysis; public domain software; Apache Hadoop MapReduce paradigm; Hadoop installation; Internet; commonly each distributed computing; concrete data analysis application; data processing; multicluster environment; Bandwidth; Biological system modeling; Computers; Concrete; Cooling; Data analysis; Data models;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information & Communication Technology Electronics & Microelectronics (MIPRO), 2013 36th International Convention on
  • Conference_Location
    Opatija
  • Print_ISBN
    978-953-233-076-2
  • Type

    conf

  • Filename
    6596280