Title :
A greedy algorithm for selecting models in ensembles
Author :
Turinsky, Andrei L. ; Grossman, Robert L.
Author_Institution :
Calgary Univ., Alta., Canada
Abstract :
We are interested in ensembles of models built over k data sets. Common approaches are either to combine models by vote averaging, or to build a meta-model on the outputs of the local models. In this paper, we consider the model assignment approach, in which a meta-model selects one of the local statistical models for scoring. We introduce an algorithm called greedy data labeling (GDL) that improves the initial data partition by reallocating some data, so that when each model is built on its local data subset, the resulting hierarchical system has minimal error. We present evidence that model assignment may in certain situations be more natural than traditional ensemble learning, and if enhanced by GDL, it often outperforms traditional ensembles.
Keywords :
data mining; greedy algorithms; statistical analysis; ensemble learning; greedy algorithm; greedy data labeling; model assignment approach; statistical model; vote averaging; Boosting; Cardiac disease; Data mining; Greedy algorithms; Hierarchical systems; Labeling; Partitioning algorithms; Predictive models; Sampling methods; Voting;
Conference_Titel :
Data Mining, 2004. ICDM '04. Fourth IEEE International Conference on
Print_ISBN :
0-7695-2142-8
DOI :
10.1109/ICDM.2004.10009