• DocumentCode
    3078007
  • Title

    Towards Provenance-Based Anomaly Detection in MapReduce

  • Author

    Cong Liao ; Squicciarini, Anna

  • Author_Institution
    Coll. of Inf. Sci. & Technol., Pennsylvania State Univ., University Park, PA, USA
  • fYear
    2015
  • fDate
    4-7 May 2015
  • Firstpage
    647
  • Lastpage
    656
  • Abstract
    MapReduce enables parallel and distributed processing of vast amount of data on a cluster of machines. However, such computing paradigm is subject to threats posed by malicious and cheating nodes or compromised user submitted code that could tamper data and computation since users maintain little control as the computation is carried out in a distributed fashion. In this paper, we focus on the analysis and detection of anomalies during the process of MapReduce computation. Accordingly, we develop a computational provenance system that captures provenance data related to MapReduce computation within the MapReduce framework in Hadoop. In particular, we identify a set of invariants against aggregated provenance information, which are later analyzed to uncover anomalies indicating possible tampering of data and computation. We conduct a series of experiments to show the efficiency and effectiveness of our proposed provenance system.
  • Keywords
    data analysis; parallel processing; security of data; Hadoop; MapReduce computation; computational provenance system; data tampering; provenance-based anomaly detection; Access control; Cloud computing; Containers; Distributed databases; Monitoring; Yarn; MapReduce; computation integrity; logging; provenance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on
  • Conference_Location
    Shenzhen
  • Type

    conf

  • DOI
    10.1109/CCGrid.2015.16
  • Filename
    7152530