DocumentCode :
282
Title :
A Graph-Based Consensus Maximization Approach for Combining Multiple Supervised and Unsupervised Models
Author :
Gao, Jing ; Liang, Feng ; Fan, Wei ; Sun, Yizhou ; Han, Jiawei
Author_Institution :
SUNY - Univ. at Buffalo, Buffalo, NY, USA
Volume :
25
Issue :
1
fYear :
2013
fDate :
Jan. 2013
Firstpage :
15
Lastpage :
28
Abstract :
Ensemble learning has emerged as a powerful method for combining multiple models. Well-known methods, such as bagging, boosting, and model averaging, have been shown to improve accuracy and robustness over single models. However, due to the high costs of manual labeling, it is hard to obtain sufficient and reliable labeled data for effective training. Meanwhile, lots of unlabeled data exist in these sources, and we can readily obtain multiple unsupervised models. Although unsupervised models do not directly generate a class label prediction for each object, they provide useful constraints on the joint predictions for a set of related objects. Therefore, incorporating these unsupervised models into the ensemble of supervised models can lead to better prediction performance. In this paper, we study ensemble learning with outputs from multiple supervised and unsupervised models, a topic where little work has been done. We propose to consolidate a classification solution by maximizing the consensus among both supervised predictions and unsupervised constraints. We cast this ensemble task as an optimization problem on a bipartite graph, where the objective function favors the smoothness of the predictions over the graph, but penalizes the deviations from the initial labeling provided by the supervised models. We solve this problem through iterative propagation of probability estimates among neighboring nodes and prove the optimality of the solution. The proposed method can be interpreted as conducting a constrained embedding in a transformed space, or a ranking on the graph. Experimental results on different applications with heterogeneous data sources demonstrate the benefits of the proposed method over existing alternatives. (More information, data, and code are available at http://www.cse.buffalo.edu/~jing/integrate.htm.)
Keywords :
graph theory; iterative methods; optimisation; pattern classification; probability; unsupervised learning; bipartite graph; ensemble learning; graph-based consensus maximization approach; heterogeneous data source; iterative propagation; manual labeling; optimization problem; pattern classification; probability estimation; ranking; supervised model; supervised prediction; unsupervised constraint; unsupervised model; Bipartite graph; Data models; Labeling; Motion pictures; Optimization; Predictive models; Social network services; Knowledge integration; clustering ensemble; ensemble learning; semi-supervised learning;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2011.206
Filename :
6035707
Link To Document :
بازگشت