Title of article :
A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data
Author/Authors :
Bommert, Andrea Department of Statistics - TU Dortmund University - Dortmund, Germany , Rahnenführer, Jörg Department of Statistics - TU Dortmund University - Dortmund, Germany , Lang, Michel Department of Statistics - TU Dortmund University - Dortmund, Germany
Abstract :
Finding a good predictive model for a high-dimensional data set can be challenging. For genetic data, it is not only important to
find a model with high predictive accuracy, but it is also important that this model uses only few features and that the selection
of these features is stable. This is because, in bioinformatics, the models are used not only for prediction but also for drawing
biological conclusions which makes the interpretability and reliability of the model crucial. We suggest using three target criteria
when fitting a predictive model to a high-dimensional data set: the classification accuracy, the stability of the feature selection,
and the number of chosen features. As it is unclear which measure is best for evaluating the stability, we first compare a variety of
stability measures. We conclude that the Pearson correlation has the best theoretical and empirical properties. Also, we find that for
the stability assessment behaviour it is most important that a measure contains a correction for chance or large numbers of chosen
features. Then, we analyse Pareto fronts and conclude that it is possible to find models with a stable selection of few features without
losing much predictive accuracy.
Keywords :
Multicriteria , High-Dimensional , Models
Journal title :
Computational and Mathematical Methods in Medicine