Author/Authors :
Cuzzocrea ، Alfredo iDEA Lab Department of Computer Science - University of Calabria , Mumolo ، Enzo Department of Engineering - University of Trieste , Belmerabet ، Islam iDEA Lab - University of Calabria , Hafsaoui ، Abderraouf iDEA Lab - University of Calabria
Abstract :
We propose Cloud-based machine learning tools for enhanced Big Data applications, where the main idea is that of predicting the \next workload occurring against the target Cloud infrastructure via an innovative ensemble-based approach that combines the e ectiveness of di erent well-known classifiers in order to enhance the whole accuracy of the final classification, which is very relevant at now in the specific context of Big Data. The so- called workload categorization problem plays a critical role in improving the e ciency and reliability of Cloud-based big data applications. Implementation-wise, our method proposes deploying Cloud entities that participate in the distributed classification approach on top of virtual machines, which represent classical \commodity settings for Cloud-based big data applications. Given a number of known reference workloads, and an unknown workload, in this paper we deal with the problem of finding the reference workload which is most similar to the unknown one. The depicted scenario turns out to be useful in a plethora of modern information system applications. We name this problem as coarse-grained workload classification, because, instead of characterizing the unknown workload in terms of finer behaviors, such as CPU, memory, disk, or network intensive patterns, we classify the whole unknown workload as one of the (possible) reference workloads. Reference workloads represent a category of workloads that are relevant in a given applicative environment. In particular, we focus our attention on the classification problem described above in the special case represented by virtualized environments. Today, Virtual Machines (VMs) have become very popular because they o er important advantages to modern computing environments such as cloud computing or server farms. In virtualization frameworks, workload classification is very useful for accounting, security reasons, or user profiling. Hence, our research makes more sense in such environments, and it turns out to be very useful in a special context like Cloud Computing, which is emerging now. In this respect, our approach consists of running several machine learning-based classifiers of di erent workload models, and then deriving the best classifier produced by the Dempster-Shafer Fusion, in order to magnify the accuracy of the final classification. Experimental assessment and analysis clearly confirm the benefits derived from our classification framework. The running programs which produce unknown workloads to be classified are treated in a similar way. A fundamental aspect of this paper concerns the successful use of data fusion in workload classification. Dierent types of metrics are in fact fused together using the Dempster-Shafer theory of evidence combination, giving a classification accuracy of slightly less than 80%. The acquisition of data from the running process, the pre-processing algorithms, and the workload classification are described in detail. Various classical algorithms have been used for classification to classify the workloads, and the results are compared.
Keywords :
Virtual machines , Workload , Dempster-Shafer theory , Classification