• DocumentCode
    130894
  • Title

    Fault tolerant data flow using curator — Storm

  • Author

    Sainik, Lavanya ; Khajuria, Dheeraj

  • Author_Institution
    Centre of Excellence Mediation & Device, Ericsson India Global Services Pvt. Ltd., Gurgaon, India
  • fYear
    2014
  • fDate
    27-29 June 2014
  • Firstpage
    472
  • Lastpage
    475
  • Abstract
    Driven by the 3GPP (3rd Generation Partnership Project) evolving standards and advent of Big Data technology, to deal with huge volume, velocity and variety of data, various industries like telecommunication, warehousing and storage, financial and many more industries need to be compliant with this evolving technology. There is a huge demand to process both real time and stored data. In this paper we have analyzed an open source framework Storm, which is a real time distributed processing engine and suggesting an improvement on its fault tolerance mechanism so that it can be flawlessly used for any data processing use case. Vanilla storm provides guaranteed message processing however it promises “at least once” level of processing. For perfect fault tolerant system “exactly one” level of processing is required and to achieve this storm provides another framework, Trident which is built on top of it. Trident provides transactional spout where transactional metadata information <; transaction id, data > is stored in zookeeper which provides distributed coordination, thus across node / hardware data can be replayed in case of any failure, timeout, retry. Trident uses zookeeper for coordination of transactional information through apache curator framework. However with current trident framework per activity level (aggregator/reducer) commit can be easily obtained but no direct implementation for single chain level transaction commit. This paper describes an approach where by modifying existing transactional trident, chain level commit can be obtained using curator recipes.
  • Keywords
    Big Data; data flow computing; fault tolerant computing; meta data; public domain software; 3GPP; 3rd generation partnership project; Big Data technology; Vanilla storm; apache curator framework; data processing; distributed coordination; fault tolerance mechanism; fault tolerant data flow; guaranteed message processing; hardware data; node data; open source framework; real time distributed processing engine; transaction id; transactional information; transactional metadata information; transactional spout; trident framework; zookeeper; Distributed databases; Fasteners; Fault tolerance; Fault tolerant systems; Radiation detectors; Real-time systems; Storms; Big data; Fault tolerance; PathChildrenCache; Real time data; Storm; apache curator; batch input; transaction management; transactional spout; zookeeper;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Engineering and Service Science (ICSESS), 2014 5th IEEE International Conference on
  • Conference_Location
    Beijing
  • ISSN
    2327-0586
  • Print_ISBN
    978-1-4799-3278-8
  • Type

    conf

  • DOI
    10.1109/ICSESS.2014.6933608
  • Filename
    6933608