Author/Authors :
Lin، نويسنده , , Xiaohui and Yang، نويسنده , , Fufang and Zhou، نويسنده , , Lina and Yin، نويسنده , , Peiyuan and Kong، نويسنده , , Hongwei and Xing، نويسنده , , Wenbin and Lu، نويسنده , , Xin and Jia، نويسنده , , Lewen and Wang، نويسنده , , Quancai and Xu، نويسنده , , Guowang، نويسنده ,
Abstract :
Filtering the discriminative metabolites from high dimension metabolome data is very important in metabolomics study. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique and has shown promising applications in the analysis of the metabolome data. SVM-RFE measures the weights of the features according to the support vectors, noise and non-informative variables in the high dimension data may affect the hyper-plane of the SVM learning model. Hence we proposed a mutual information (MI)-SVM-RFE method which filters out noise and non-informative variables by means of artificial variables and MI, then conducts SVM-RFE to select the most discriminative features. A serum metabolomics data set from patients with chronic hepatitis B, cirrhosis and hepatocellular carcinoma analyzed by liquid chromatography–mass spectrometry (LC–MS) was used to demonstrate the validation of our method. An accuracy of 74.33 ± 2.98% to distinguish among three liver diseases was obtained, better than 72.00 ± 4.15% from the original SVM-RFE. Thirty-four ion features were defined to distinguish among the control and 3 liver diseases, 17 of them were identified.
Keywords :
mutual information , Artificial contrast variables , Liver diseases , Metabolomics , SVM-RFE