DocumentCode
3408924
Title
Selection of patient samples and genes for outcome prediction
Author
Liu, Huiqing ; Li, Jinyan ; Wong, Limsoon
Author_Institution
Inst. for Infocomm Res., Singapore, Singapore
fYear
2004
fDate
16-19 Aug. 2004
Firstpage
382
Lastpage
392
Abstract
Gene expression profiles with clinical outcome data enable monitoring of disease progression and prediction of patient survival at the molecular level. We present a new computational method for outcome prediction. Our idea is to use an informative subset of original training samples. This subset consists of only short-term survivors who died within a short period and long-term survivors who were still alive after a long follow-up time. These extreme training samples yield a clear platform to identify genes whose expression is related to survival. To find relevant genes, we combine two feature selection methods - entropy measure and Wilcoxon rank sum test - so that a set of sharp discriminating features are identified. The selected training samples and genes are then integrated by a support vector machine to build a prediction model, by which each validation sample is assigned a survival/relapse risk score for drawing Kaplan-Meier survival curves. We apply this method to two data sets: diffuse large-B-cell lymphoma (DLBCL) and primary lung adenocarcinoma. In both cases, patients in high and low risk groups stratified by our risk scores are clearly distinguishable. We also compare our risk scores to some clinical factors, such as International Prognostic Index score for DLBCL analysis and tumor stage information for lung adenocarcinoma. Our results indicate that gene expression profiles combined with carefully chosen learning algorithms can predict patient survival for certain diseases.
Keywords
cancer; cellular biophysics; entropy; genetics; learning (artificial intelligence); lung; medical computing; molecular biophysics; patient monitoring; physiological models; support vector machines; tumours; International Prognostic Index score; Kaplan-Meier survival curves; Wilcoxon rank sum test; clinical outcome prediction; diffuse large-B-cell lymphoma; disease progression monitoring; entropy; feature selection methods; gene expression profiles; genes selection; learning algorithms; patient sample selection; patient survival; prediction model; primary lung adenocarcinoma; support vector machine; survival/relapse risk score; tumor stage information; Diseases; Entropy; Gene expression; Information analysis; Lungs; Patient monitoring; Predictive models; Risk analysis; Support vector machines; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE
Print_ISBN
0-7695-2194-0
Type
conf
DOI
10.1109/CSB.2004.1332451
Filename
1332451
Link To Document