• DocumentCode
    1918904
  • Title

    Abstract: Hadoop´s Adolescence; A Comparative Workloads Analysis from Three Research Clusters

  • Author

    Ren, Kai ; Gibson, Garth ; Kwon, YongChul ; Balazinska, Magdalena ; Howe, Bill

  • fYear
    2012
  • fDate
    10-16 Nov. 2012
  • Firstpage
    1452
  • Lastpage
    1452
  • Abstract
    We analyze Hadoop workloads from three different research clusters from an application-level perspective, with two goals: (1) explore new issues in application patterns and user behavior and (2) understand key performance chal- lenges related to IO and load balance. Our analysis sug- gests that Hadoop usage is still in its adolescence. We see underuse of Hadoop features, extensions, and tools as well as significant opportunities for optimization. We see signif- icant diversity in application styles, including some "inter- active" workloads, motivating new tools in the ecosystem. We find that some conventional approaches to improving performance are not especially effective and suggest some alternatives. Overall, we find significant opportunity for simplifying the use and optimization of Hadoop, and make recommendations for future research.
  • Keywords
    Hadoop; Log Analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:
  • Conference_Location
    Salt Lake City, UT
  • Print_ISBN
    978-1-4673-6218-4
  • Type

    conf

  • DOI
    10.1109/SC.Companion.2012.253
  • Filename
    6496036