Title :
Scalable and parallel machine learning algorithms for statistical data mining - Practice & experience
Author :
Riedel, M. ; Goetz, M. ; Richerzhagen, M. ; Glock, P. ; Bodenstein, C. ; Memon, A.S. ; Memon, M.S.
Author_Institution :
Juelich Supercomput. Centre, Forschungszentrum Juelich, Julich, Germany
Abstract :
Many scientific datasets (e.g. earth sciences, medical sciences, etc.) increase with respect to their volume or in terms of their dimensions due to the ever increasing quality of measurement devices. This contribution will specifically focus on how these datasets can take advantage of new `big data´ technologies and frameworks that often are based on parallelization methods. Lessons learned with medical and earth science data applications that require parallel clustering and classification techniques such as support vector machines (SVMs) and density-based spatial clustering of applications with noise (DBSCAN) are a substantial part of the contribution. In addition, selected experiences of related `big data´ approaches and concrete mining techniques (e.g. dimensionality reduction, feature selection, and extraction methods) will be addressed too. In order to overcome identified challenges, we outline an architecture framework design that we implement with open available tools in order to enable scalable and parallel machine learning applications in distributed systems.
Keywords :
Big Data; data mining; learning (artificial intelligence); parallel processing; pattern classification; pattern clustering; statistical analysis; support vector machines; Big Data technology; DBSCAN; SVMs; architecture framework design; classification techniques; concrete mining techniques; density-based spatial clustering of applications with noise; distributed systems; measurement device quality; parallel clustering; parallel machine learning algorithms; scalable machine learning algorithms; statistical data mining; support vector machines; Algorithm design and analysis; Clustering algorithms; Concrete; Data mining; Machine learning algorithms; Standards; Support vector machines;
Conference_Titel :
Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2015 38th International Convention on
Conference_Location :
Opatija
DOI :
10.1109/MIPRO.2015.7160265