Title :
Sparse omics-network regularization to increase interpretability and performance of linear classification models
Author :
Michael And?l;Filip Masri;Ji?? Kl?ma;Zden?k Krej??k;Monika Beli?kov?
Author_Institution :
Department of Computer Science, Czech Technical University in Prague, Technick? 2, Czech Republic
Abstract :
Current high-throughput technologies lead to the boost of omics data with thousands of features measured in parallel. The phenotype specific markers are learned from the data to better understand the disease mechanism and to build predictive models. However, the learning is prone to overfitting, caused by a small sample size and large feature space dimension. Consequently, resulting models are inaccurate and difficult to interpret due to the complex nature of omics processes. In this paper, we propose a methodology for learning simple yet biologically meaningful linear classification models. A linear support vector machine is trained; the learning is regularized by prior knowledge. Regularization parameters enable the expert to operatively adjust the interpretation of the models and their conformity with recent domain research while maintaining their accuracy. We performed robust experiments showing empirical validity of our methodology. In the study related to myelodysplastic syndrome we demonstrate the performance and interpretation of disease classification models. These models are consistent with recent progress in myelodysplastic syndrome research.
Keywords :
"Support vector machines","Proteins"
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference on
DOI :
10.1109/BIBM.2015.7359754