مرکز منطقه ای اطلاع رساني علوم و فناوري

چكيده فارسي :

Generally defined, factor analysis (FA) is any method that decomposes a data matrix (or, in a more general case, a data tensor) into a bilinear (or multilinear) model of lower dimensionality. Principal components analysis (PCA) is the most popular FA technique in chemistry [1]. Another popular FA technique is a maximum likelihood based common factor analysis (MLCFA or CFA) technique [2,3] which is available in the Statistical toolbox of Matlab, as factoran.m. Although both are ML based, MLCFA is different from MLPCA [4]. Principal axis factoring (PAF) is another common factor analysis technique [5,6] that is similar to CFA in that it decomposes data into specific factors in addition to common factors. This report is a comparison of CFA, PAF and PCA regarding their resulting profiles and subspaces. Simulated data sets including multivariate normal and non-multivariate normal (chromatographic and spectral kinetic) data were considered, in addition to an experimental data set including fatty acids of different groups of fish samples. Different types of noise, including independent and identically distributed (iid), column heteroscedastic and general heteroscedastic were added to the simulated data, in different levels. In presence of iid noise, results from CFA, PAF and PCA are almost the same and the angle between calculated and true profile subspaces are higher when using non-multivariate normal data. In the presence of heteroscedastic noise, the subspace of CFA and PAF profiles are closer to that of the true profiles compared to PCA. In the presence of heteroscedastic noise, CFA and PAF result in different profiles and reconstruct the covariance matrix of data better than PCA, which assumes iid errors. In the case of multivariate normal (MN) data, a likelihood and chi-squared based statistical test can be applied for determining the optimum number of applied factors in the model. A major advantage of using PAF over CFA is the possibility of using PAF for “fat” data in which the number of samples (rows) is lower than the number of variables (columns). MLPCA results in best reconstructions of data covariance and lowest angle between subspaces of estimated and real profiles; however it needs the noise structure to be completely known.