DocumentCode :
3605585
Title :
A Dimensionally Reduced Clustering Methodology for Heterogeneous Occupational Medicine Data Mining
Author :
Saadaoui, Foued ; Bertrand, Pierre R. ; Boudet, Gil ; Rouffiac, Karine ; Dutheil, Frederic ; Chamoux, Alain
Author_Institution :
Coll. of Adm. & Finance, Saudi Electron. Univ., Riyadh, Saudi Arabia
Volume :
14
Issue :
7
fYear :
2015
Firstpage :
707
Lastpage :
715
Abstract :
Clustering is a set of techniques of the statistical learning aimed at finding structures of heterogeneous partitions grouping homogenous data called clusters. There are several fields in which clustering was successfully applied, such as medicine, biology, finance, economics, etc. In this paper, we introduce the notion of clustering in multifactorial data analysis problems. A case study is conducted for an occupational medicine problem with the purpose of analyzing patterns in a population of 813 individuals. To reduce the data set dimensionality, we base our approach on the Principal Component Analysis (PCA), which is the statistical tool most commonly used in factorial analysis. However, the problems in nature, especially in medicine, are often based on heterogeneous-type qualitative-quantitative measurements, whereas PCA only processes quantitative ones. Besides, qualitative data are originally unobservable quantitative responses that are usually binary-coded. Hence, we propose a new set of strategies allowing to simultaneously handle quantitative and qualitative data. The principle of this approach is to perform a projection of the qualitative variables on the subspaces spanned by quantitative ones. Subsequently, an optimal model is allocated to the resulting PCA-regressed subspaces.
Keywords :
biomedical engineering; data mining; data reduction; medical computing; occupational health; pattern clustering; principal component analysis; PCA regressed subspaces; data set dimensionality reduction; dimensionally reduced clustering methodology; heterogeneous data mining; heterogeneous partitions; heterogeneous type qualitative-quantitative measurements; homogenous data grouping; multifactorial data analysis; occupational medicine problem; principal component analysis; qualitative data; quantitative data; statistical learning; Covariance matrices; Data analysis; Data mining; Data models; Eigenvalues and eigenfunctions; Occupational medicine; Principal component analysis; Data-mining; EM algorithm; PCA; finite mixture modelling; heterogenous data; occupational medicine;
fLanguage :
English
Journal_Title :
NanoBioscience, IEEE Transactions on
Publisher :
ieee
ISSN :
1536-1241
Type :
jour
DOI :
10.1109/TNB.2015.2477407
Filename :
7247734
Link To Document :
بازگشت