Title : 
Progressive Data Stream Mining and Transaction Classification for Workload-Aware Incremental Database Repartitioning
         
        
            Author : 
Joarder Mohammad Mustafa Kamal;Manzur Murshed;Mohamed Medhat Gaber
         
        
            Author_Institution : 
Fac. of Inf. Technol., Monash Univ., Monash, VIC, Australia
         
        
        
        
        
            Abstract : 
Minimising the impact of distributed transactions (DTs) in a shared-nothing distributed database is extremely challenging for transactional workloads. With dynamic workload nature and rapid growth in data volume the underlying database requires incremental repartitioning to maintain acceptable level of DTs and data load balance with minimum physical data migrations. In a workload-aware repartitioning scheme transactional workload is modelled as graph or hyper graph, and subsequently perform k-way min-cut clustering guaranteeing minimum edge cuts can reduce the impact of DTs significantly by mapping the workload clusters into logical database partitions. However, without exploring the inherent workload characteristics, the overall processing and computing times for large-scale workload networks increase in polynomial orders. In this paper, a workload-aware incremental database repartitioning technique is proposed, which effectively exploits proactive transaction classification and workload stream mining techniques. Workload batches are modelled in graph, hyper graph, and compressed hyper graph then repartitioned to produce a fresh tuple-to-partition data migration plan for every incremental cycle. Experimental studies in a simulated TPC-C environment demonstrate that the proposed model can be effectively adopted in managing rapid data growth and dynamic workloads, thus progressively reduce the overall processing time required to operate over the workload networks.
         
        
            Keywords : 
"Servers","Distributed databases","Data mining","Big data","Polynomials","Electronic mail"
         
        
        
            Conference_Titel : 
Big Data Computing (BDC), 2014 IEEE/ACM International Symposium on