مرکز منطقه ای اطلاع رساني علوم و فناوري - Distributed fuzzy rough prototype selection for Big Data regression

DocumentCode :

3664018

Title :

Distributed fuzzy rough prototype selection for Big Data regression

Author :

Sarah Vluymans;Hasan Asfoor;Yvan Saeys;Chris Cornelis;Matthew Tolentino;Ankur Teredesai;Martine De Cock

Author_Institution :

Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium

fYear :

2015

Firstpage :

Lastpage :

Abstract :

Size and complexity of Big Data requires advances in machine learning algorithms to adequately learn from such data. While distributed shared-nothing architectures (Hadoop/Spark) are becoming increasingly popular to develop such new algorithms, it is quite challenging to adapt existing machine learning algorithms. In this paper, we propose a solution for big data regression, where the aim is to learn the regression model over large high-dimensional datasets. First, a new distributed implementation of the weighted kNN regression method is presented followed by a novel distributed prototype selection method based on fuzzy rough set theory. Experiments demonstrate that our implementations in Apache Spark for the proposed distributed algorithms handle the size and complexity of modern real-world datasets well. We furthermore show that application of our prototype selection method improves the regression accuracy.

Keywords :

"Approximation methods","Prototypes","Training","Big data","Sparks","Set theory","Scalability"

Publisher :

ieee

Conference_Titel :

Fuzzy Information Processing Society (NAFIPS) held jointly with 2015 5th World Conference on Soft Computing (WConSC), 2015 Annual Conference of the North American

Type :

conf

DOI :

10.1109/NAFIPS-WConSC.2015.7284158

Filename :

7284158

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3664018