Title :
Nonlinear principal component analysis of noisy data
Author :
Hsieh, William W.
Author_Institution :
British Columbia Univ., Vancouver
Abstract :
With very noisy data, overfitting is a serious problem in pattern recognition. For nonlinear regression, having plentiful data eliminates overfitting, but for nonlinear principal component analysis (NLPCA), overfitting persists even with plentiful data. Thus simply minimizing mean square error is not a sufficient criterion for NLPCA to find good solutions in noisy data. A new index is proposed which measures the disparity between the nonlinear principal components u and u macr for a data point x and its nearest neighbour x macr. This index, 1middotCS (the Spearman rank correlation between u and u macr), tends to increase with overfitted solutions, thereby providing a diagnostic tool to determine how much regularization (i.e. weight penalty) should be used in the objective function of the NLPCA to prevent overfitting. Tests are performed using autoassociative neural networks for NLPCA on synthetic and real climate data.
Keywords :
correlation methods; mean square error methods; neurocontrollers; nonlinear control systems; pattern recognition; principal component analysis; regression analysis; Spearman rank correlation; autoassociative neural networks; mean square error minimization; noisy data; nonlinear principal component analysis; nonlinear regression; objective function; pattern recognition; Clouds; Geometry; Kernel; Mean square error methods; Neural networks; Pattern recognition; Performance evaluation; Principal component analysis; Scattering; Testing;
Conference_Titel :
Neural Networks, 2006. IJCNN '06. International Joint Conference on
Conference_Location :
Vancouver, BC
Print_ISBN :
0-7803-9490-9
DOI :
10.1109/IJCNN.2006.247086