• DocumentCode
    650660
  • Title

    Scaling Archived Social Media Data Analysis Using a Hadoop Cloud

  • Author

    Conejero, Javier ; Burnap, Pete ; Rana, Omer ; Morgan, J.

  • Author_Institution
    Dept. of Comput. Syst., Univ. of Castilla-La Mancha, Albacete, Spain
  • fYear
    2013
  • fDate
    June 28 2013-July 3 2013
  • Firstpage
    685
  • Lastpage
    692
  • Abstract
    Over recent years, there has been an emerging interest in supporting social media analysis for marketing, opinion analysis and understanding community cohesion. Social media data conforms to many of the categorisations attributed to "big-data" -- i.e. volume, velocity and variety. Generally analysis needs to be undertaken over large volumes of data in an efficient and timely manner. A variety of computational infrastructures have been reported to achieve this. We present the COSMOS platform supporting sentiment and tension analysis on Twitter data, and demonstrate how this platform can be scaled using the OpenNebula Cloud environment with Map/Reduce-based analysis using Hadoop. In particular, we describe the types of system configurations that would be most useful from a performance perspective -- i.e. how virtual machines in the infrastructure should be distributed to reduce variability in the analysis performance. We demonstrate the approach using a data set consisting of several million Twitter messages, analysed over two types of Cloud infrastructure.
  • Keywords
    cloud computing; social networking (online); virtual machines; COSMOS platform; Hadoop cloud; Map/Reduce-based analysis; OpenNebula cloud environment; Twitter data; archived social media data analysis; big-data; computational infrastructures; sentiment analysis; system configurations; tension analysis; virtual machines; Cloud computing; Data analysis; Educational institutions; Media; Real-time systems; Twitter; Virtualization; COSMOS; Hadoop; OpenNebula Cloud; Twitter data analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on
  • Conference_Location
    Santa Clara, CA
  • Print_ISBN
    978-0-7695-5028-2
  • Type

    conf

  • DOI
    10.1109/CLOUD.2013.120
  • Filename
    6676757