DocumentCode
3717226
Title
A MapReduce based k-NN joins probabilistic classifier
Author
Georgios Chatzigeorgakidis;Sophia Karagiorgou;Spiros Athanasiou;Spiros Skiadopoulos
Author_Institution
University of P?loponn?se, Department of Informatics and Telecommunications, Tripolis, Greece
fYear
2015
Firstpage
952
Lastpage
957
Abstract
Water management field has concentrated great interest, with the potential to affect the long term well-being, the societal economy and security. In parallel, it imposes specific research challenges which have not been already met, due to the lack of fine-grained data. Knowledge extraction and decision making for efficient management in the energy field has attracted a lot of interest in Big Data research. However, the water domain is strikingly absent, with minimal focused work on data exploitation and useful information extraction. The goal of this work is to discover persistent and meaningful knowledge from water consumption data and provide efficient and scalable big data management and analysis services. We propose a novel methodology which exploits machine learning techniques and introduces a robust probabilistic classifier which is able to operate on data of arbitrary dimensionality and of huge volume. It also provides added value services and new operation models for the water management domain, inducing sustainable behavioural changes for consumers, which can further raise social awareness. It does so through a new k-Nearest Neighbour based algorithm, developed in a parallel and distributed environment, which operates over Big Data and discovers useful knowledge about consumption classes and other water related attitudinal properties. A detailed experimental evaluation assesses the effectiveness and efficiency of the algorithm on prediction precision along with the provision of analytics. The results show that this method is prosperous and provides accurate and interesting results that allow us to identify useful characteristics, not only for the households, but also for the water utilities.
Keywords
"Big data","Water resources","Probabilistic logic","Data mining","Forecasting","Distributed databases","Programming"
Publisher
ieee
Conference_Titel
Big Data (Big Data), 2015 IEEE International Conference on
Type
conf
DOI
10.1109/BigData.2015.7363844
Filename
7363844
Link To Document