Title of article :
Pattern recognition of gas chromatography mass spectrometry of human volatiles in sweat to distinguish the sex of subjects and determine potential discriminatory marker peaks
Author/Authors :
Dixon، نويسنده , , Sarah J. and Xu، نويسنده , , Yun and Brereton، نويسنده , , Richard G. and Soini، نويسنده , , Helena A. and Novotny، نويسنده , , Milos V. and Oberzaucher، نويسنده , , Elisabeth and Grammer، نويسنده , , Karl and Penn، نويسنده , , Dustin J.، نويسنده ,
Issue Information :
دوفصلنامه با شماره پیاپی سال 2007
Abstract :
Pattern recognition studies are performed on the gas chromatography mass spectrometry of extracts of human sweat of 182 subjects sampled 5 times (over 5 fortnights), in an attempt to determine whether it is possible to classify samples into those arising from males and females. All methods were applied to peak tables of square root normalised GC-MS peak areas. Potential markers were identified using both a univariate (t-statistic) and multivariate (Partial Least Squares Discriminant Analysis: PLS-DA) method, on each fortnight separately, selecting those peaks that have high ranks each fortnight. Classification was performed using PLS-DA, selecting the model using 100 repetitions for each fortnight dividing the data into test and training sets randomly, and using the bootstrap to find the number of significant components for each of the 100 models. Contingency tables can be drawn up for the number of misclassified samples, using three error criteria, namely autoprediction, bootstrap and test set. The decision threshold for which sample is assigned to a group can be adjusted and Receiver Operator Characteristic curves were used to visualise the influence on changing this threshold. It is shown that by using the entire set of 910 measurements there is a closer correspondence between autoprediction and test set error rates than for 182 measurements where there is less agreement, suggesting that sample size has a key role. A general strategy for studying large metabolomics datasets is proposed.
Keywords :
Partial least squares discriminant analysis , Receiver Operator Characteristic curves , Gas Chromatography mass spectrometry , Model validation , Sweat
Journal title :
Chemometrics and Intelligent Laboratory Systems
Journal title :
Chemometrics and Intelligent Laboratory Systems