مرکز منطقه ای اطلاع رساني علوم و فناوري - Multivariate strategies for classification based on NIR-spectra

Abstract :

The goal of the presented study is two-fold. First, we want to emphasize the power of Near Infrared Reflectance (NIR) spectroscopy for discrimination between mayonnaise samples containing different vegetable oils. Secondly, we want to use our data to compare the performances of different classification procedures. The NIR spectra with 351 variables correspond to equally spaced wavelengths in the 1100–2500 nm area. Feature extraction both by automatic wavelength-selection and by projection onto principal components (PCs) is discussed. The discriminant methods considered are linear discriminant analysis (LDA), quadratic discriminant analysis (QDA) and regression with categorical {0,1}-responses. A dataset containing 162 spectra of mayonnaise samples based on six different vegetable oils is analyzed. By LDA with authentic cross-validation (PC-models re-estimated for each cross-validation segment), only one sample was misclassified. Classification by allocating a sample according to the largest fitted value of a linear regression (Discriminant-Partial least squares (DPLS) or Discriminant-Principal components regression (DPCR)) is demonstrated sub-optimal compared to LDA of the corresponding PLS- or PCR-scores. QDA significantly outperforms LDA for projections of the data onto subspaces of moderate size (scores of 7–9 PCs). Two automatic variable-selection procedures choose 16 and 26 wavelengths (variables), respectively from the spectra. Based on the selected wavelengths, LDA gives considerably better classification than the regression approach. By reporting the performances of several feature extraction techniques in tandem with three of the most common classification methods, we hope that the reader will notice two relevant aspects: (1) By using the DPLS and DPCR (classification by `dummyʹ regressions) one is exposed to a significant risk of obtaining sub-optimal classification results; (2) The automatic wavelength selections may give valuable information about what is actually causing a successful discrimination. Such knowledge can, for instance, be used to select the most suited filters for online applications of NIR. Besides, from demonstrating different classification strategies, our study clearly shows that classification methods with NIR spectra can be used to discriminate between mayonnaise samples of different oil types and fatty acid composition.