• DocumentCode
    124147
  • Title

    Mining Twitter Data with Resource Constraints

  • Author

    Valkanas, George ; Katakis, Ioannis ; Gunopulos, Dimitrios ; Stefanidis, Antony

  • Author_Institution
    Univ. of Athens, Athens, Greece
  • Volume
    1
  • fYear
    2014
  • fDate
    11-14 Aug. 2014
  • Firstpage
    157
  • Lastpage
    164
  • Abstract
    Social media analysis constitutes a scientific field that is rapidly gaining ground due to its numerous research challenges and practical applications, as well as the unprecedented availability of data in real time. Several of these applications have significant social and economical impact, such as journalism, crisis management, advertising, etc. However, two issues regarding these applications have to be confronted. The first one is the financial cost. Despite the abundance of information, it typically comes at a premium price, and only a fraction is provided free of charge. For example, Twitter, a predominant social media online service, grants researchers and practitioners free access to only a small proportion (1%) of its publicly available stream. The second issue is the computational cost. Even when the full stream is available, off the shelf approaches are unable to operate in such settings due to the real-time computational demands. Consequently, real world applications as well as research efforts that exploit such information are limited to utilizing only a subset of the available data. In this paper, we are interested in evaluating the extent to which analytical processes are affected by the aforementioned limitation. In particular, we apply a plethora of analysis processes on two subsets of Twitter public data, obtained through the service´s sampling API´s. The first one is the default 1% sample, whereas the second is the Garden hose sample that our research group has access to, returning 10% of all public data. We extensively evaluate their relative performance in numerous scenarios.
  • Keywords
    application program interfaces; data mining; pricing; social networking (online); Garden hose sample; Twitter public data; computational cost; financial cost; mining Twitter data; premium price; resource constraints; service sampling API; social media analysis; social media online service; Correlation; Crisis management; Event detection; Media; Real-time systems; Sentiment analysis; Twitter;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on
  • Conference_Location
    Warsaw
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2014.29
  • Filename
    6927538