• DocumentCode
    610352
  • Title

    C-Cube: Elastic continuous clustering in the cloud

  • Author

    Zhenjie Zhang ; Hu Shu ; Zhihong Chong ; Hua Lu ; Yin Yang

  • Author_Institution
    Adv. Digital Sci. Center, Illinois at Singapore Pte. Ltd., Singapore, Singapore
  • fYear
    2013
  • fDate
    8-12 April 2013
  • Firstpage
    577
  • Lastpage
    588
  • Abstract
    Continuous clustering analysis over a data stream reports clustering results incrementally as updates arrive. Such analysis has a wide spectrum of applications, including traffic monitoring and topic discovery on microblogs. A common characteristic of streaming applications is that the amount of workload fluctuates, often in an unpredictable manner. On the other hand, most existing solutions for continuous clustering assume either a central server, or a distributed setting with a fixed number of dedicated servers. In other words, they are not ELASTIC, meaning that they cannot dynamically adapt to the amount of computational resources to the fluctuating workload. Consequently, they incur considerable waste of resources, as the servers are under-utilized when the amount of workload is low. This paper proposes C-Cube, the first elastic approach to continuous streaming clustering. Similar to popular cloud-based paradigms such as MapReduce, C-Cube routes each new record to a processing unit, e.g., a virtual machine, based on its hash value. Each processing unit performs the required computations, and sends its results to a lightweight aggregator. This design enables dynamic adding/removing processing units, as well as replacing faulty ones and re-running their tasks. In addition to elasticity, C-Cube is also effective (in that it provides quality guarantees on the clustering results), efficient (it minimizes the computational workload at all times), and generally applicable to a large class of clustering criteria. We implemented C-Cube in a real system based on Twitter Storm, and evaluated it using real and synthetic datasets. Extensive experimental results confirm our performance claims.
  • Keywords
    cloud computing; file organisation; pattern clustering; social networking (online); virtual machines; C-Cube; MapReduce; Twitter Storm; cloud computing; continuous clustering analysis; continuous streaming clustering; data stream; dedicated servers; elastic continuous clustering; hash value; lightweight aggregator; microblogs; virtual machine; Algorithm design and analysis; Approximation algorithms; Approximation methods; Clustering algorithms; Elasticity; Mathematical model; Measurement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2013 IEEE 29th International Conference on
  • Conference_Location
    Brisbane, QLD
  • ISSN
    1063-6382
  • Print_ISBN
    978-1-4673-4909-3
  • Electronic_ISBN
    1063-6382
  • Type

    conf

  • DOI
    10.1109/ICDE.2013.6544857
  • Filename
    6544857