Title of article :
Predicting liquid chromatographic retention times of peptides from the Drosophila melanogaster proteome by machine learning approaches Original Research Article
Author/Authors :
Feifei Tian، نويسنده , , Li Yang، نويسنده , , Fenglin Lv، نويسنده , , Peng Zhou، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2009
Pages :
7
From page :
10
To page :
16
Abstract :
Three machine learning algorithms as least-squares support vector machine (LSSVM), random forest (RF) and Gaussian process (GP) were used to model the quantitative structure–retention relationship (QSRR) for predicting and explaining the retention behavior of proteome-wide peptides in the reverse-phase liquid chromatography. Peptides were parameterized using CODESSA approach and 145 descriptors were obtained for each peptide, including diverse structural information such as constitutional, topological, geometrical and physicochemical property. Based upon that, the nonlinear LSSVM, RF and GP as well as another sophisticated linear method (partial least-squares regression (PLS)) were employed in the QSRR model development. By a series of systematic validations as internal cross-validation, external test and Monte Carlo cross-validation, the stability and predictive power of the constructed models were confirmed. Results show that regression models developed using nonlinear approaches such as LSSVM, RF and GP predict better than linear PLS models. Considering the retention times used in this work were measured in different columns and thus have a relatively large uncertainty (reproducibility within 7%), the optimal statistics obtained from GP modeling are satisfactory, with the coefficients of determination (R2) for training set and test set of 0.894 and 0.866, respectively.
Keywords :
Least-squares support vector machine , Random Forest , Gaussian process , Liquid chromatography , Quantitative structure–retention relationship , Peptide
Journal title :
Analytica Chimica Acta
Serial Year :
2009
Journal title :
Analytica Chimica Acta
Record number :
1037339
Link To Document :
بازگشت