Title :
A MapReduce based k-NN joins probabilistic classifier
Author :
Georgios Chatzigeorgakidis;Sophia Karagiorgou;Spiros Athanasiou;Spiros Skiadopoulos
Author_Institution :
University of P?loponn?se, Department of Informatics and Telecommunications, Tripolis, Greece
Abstract :
Water management field has concentrated great interest, with the potential to affect the long term well-being, the societal economy and security. In parallel, it imposes specific research challenges which have not been already met, due to the lack of fine-grained data. Knowledge extraction and decision making for efficient management in the energy field has attracted a lot of interest in Big Data research. However, the water domain is strikingly absent, with minimal focused work on data exploitation and useful information extraction. The goal of this work is to discover persistent and meaningful knowledge from water consumption data and provide efficient and scalable big data management and analysis services. We propose a novel methodology which exploits machine learning techniques and introduces a robust probabilistic classifier which is able to operate on data of arbitrary dimensionality and of huge volume. It also provides added value services and new operation models for the water management domain, inducing sustainable behavioural changes for consumers, which can further raise social awareness. It does so through a new k-Nearest Neighbour based algorithm, developed in a parallel and distributed environment, which operates over Big Data and discovers useful knowledge about consumption classes and other water related attitudinal properties. A detailed experimental evaluation assesses the effectiveness and efficiency of the algorithm on prediction precision along with the provision of analytics. The results show that this method is prosperous and provides accurate and interesting results that allow us to identify useful characteristics, not only for the households, but also for the water utilities.
Keywords :
"Big data","Water resources","Probabilistic logic","Data mining","Forecasting","Distributed databases","Programming"
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
DOI :
10.1109/BigData.2015.7363844