• DocumentCode
    1781825
  • Title

    Distributed real-time sentiment analysis for big data social streams

  • Author

    Rahnama, Amir Hossein Akhavan

  • Author_Institution
    Dept. of Math. Inf. Technol., Univ. of Jyvaskyla, Jyvaskyla, Finland
  • fYear
    2014
  • fDate
    3-5 Nov. 2014
  • Firstpage
    789
  • Lastpage
    794
  • Abstract
    Big data trend has enforced the data-centric systems to have continuous fast data streams. In recent years, real-time analytics on stream data has formed into a new research field, which aims to answer queries about “what-is-happening-now” with a negligible delay. The real challenge with real-time stream data processing is that it is impossible to store instances of data, and therefore online analytical algorithms are utilized. To perform real-time analytics, pre-processing of data should be performed in a way that only a short summary of stream is stored in main memory. In addition, due to high speed of arrival, average processing time for each instance of data should be in such a way that incoming instances are not lost without being captured. Lastly, the learner needs to provide high analytical accuracy measures. Sentinel is a distributed system written in Java that aims to solve this challenge by enforcing both the processing and learning process to be done in distributed form. Sentinel is built on top of Apache Storm, a distributed computing platform. Sentinel´s learner, Vertical Hoeffding Tree, is a parallel decision tree-learning algorithm based on the VFDT, with ability of enabling parallel classification in distributed environments. Sentinel also uses SpaceSaving to keep a summary of the data stream and stores its summary in a synopsis data structure. Application of Sentinel on Twitter Public Stream API is shown and the results are discussed.
  • Keywords
    Big Data; Java; data analysis; decision trees; parallel algorithms; pattern classification; social networking (online); Apache Storm; Java; Sentinel; SpaceSaving; Twitter public stream API; VFDT; big data social streams; data stream summary; distributed computing platform; distributed real-time sentiment analysis; distributed system; parallel classification; parallel decision tree-learning algorithm; synopsis data structure; vertical Hoeffding tree; very fast decision tree; Algorithm design and analysis; Computational modeling; Data mining; Parallel processing; Real-time systems; Sentiment analysis; Twitter; Twitter; distributed data mining systems; distributed systems; machine learning; real-time analytics; sentiment analysis; social media mining; vertical hoeffding tree;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Control, Decision and Information Technologies (CoDIT), 2014 International Conference on
  • Conference_Location
    Metz
  • Type

    conf

  • DOI
    10.1109/CoDIT.2014.6996998
  • Filename
    6996998