• DocumentCode
    3462243
  • Title

    Memorization of Materialization Points

  • Author

    Hoger, Marek ; Kao, Odej

  • Author_Institution
    Tech. Unerversitat Berlin, Berlin, Germany
  • fYear
    2013
  • fDate
    3-5 Dec. 2013
  • Firstpage
    1247
  • Lastpage
    1254
  • Abstract
    Data streaming frameworks, constructed to work on large numbers of processing nodes in order to analyze big data, are fault-prone. Not only the large amount of nodes and network components that could fail are a source of errors. Development of data analyzing jobs has the disadvantage that errors or wrong assumptions about the input data may only be detected in productive processing. This usually leads to a re-execution of the entire job and re-computing all input data. This can be a tremendous profuseness of computing time if most of the job´s tasks are not affected by these changes and therefore process and produce the same exact data again. This paper describes an approach to use materialized intermediate data from previous job executions to reduce the number of tasks that have to be re-executed in case of an updated job. Saving intermediate data to disk is a common technique to achieve fault tolerance in data streaming systems. These intermediate results can be used for memoization to avoid needless re-execution of tasks. We show that memoization can decrease the runtime of an updated job distinctly.
  • Keywords
    Big Data; fault tolerant computing; Big Data; data saving; data streaming framework; data streaming systems; disk; fault tolerance; fault-prone; job executions; job tasks; materialization points; materialized intermediate data; memoization; network components; processing nodes; task reduction; task reexecution; updated job runtime; Engines; Fault tolerance; Fault tolerant systems; Indexes; Optical character recognition software; Runtime; Terrestrial atmosphere; fault tolrance; materialization; memoization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on
  • Conference_Location
    Sydney, NSW
  • Type

    conf

  • DOI
    10.1109/CSE.2013.186
  • Filename
    6755368