چكيده فارسي :
Sparse non-negative matrix factorization (SNMF) is a recently developed technique for
finding parts-based linear representations of non-negative data.The present contribution
is about the implementation of sparsity constraint in multivariate curve resolutionalternating
least square (MCR-ALS) techniques for analysis of GC-MS/LC-MS data.
The GC-MS and LC-MS data are sparse in mass dimension, and implementation of
SNMF techniques would be useful for analyzing such two-way chromatographic data.
In this work, the L1-and L2 regularization paradigms have been implemented in each
iteration of the MCR-ALS algorithm in order to force the algorithm to return sparser
spectral profiles. Multivariate Elastic net regression (ENR), least absolute shrinkage and
selection operator (Lasso) and minimum absolute deviation regression (MADR) were
used instead of the ordinary least square in MCR methods. A comprehensive
comparison has been made between MCR-ALS, ENR-MCR-ALS, Lasso-MCR-ALS
and MADR-MCR-ALS algorithms for deconvolution of the simulated two-component
GC-MS data. The comparison has been made thorough the calculation of the values of
sum of square errors (SSE) for 5000 times repetition of both algorithms using the
random spectral/concentration profiles as initial estimates. The results revealed that
regularization of L1-norm of the spectral profiles is more effective than confining the
values of L2-norm. Implementation of L1-constraint in spectral profiles prevents
occurrence of overfitting in ALS algorithm and this increases the probability of finding
“true solution” after the deconvolution procedure. Moreover, the effect of this “sparsity
constraint” has been explored on the area of feasible solutions in MCR methods. The
results in work revealed that implementation of L1-constraint reduces the extent of
rotational ambiguity. Finally, a graphical user interface (GUI) has been developed for
easy implementation of this constraint on ALS algorithm. This GUI can be used for
analysis of two component GC-MS/LC-MS data with high degrees of overlapping in
mass/concentration profiles.