DocumentCode :
77175
Title :
Random Forest Construction With Robust Semisupervised Node Splitting
Author :
Xiao Liu ; Mingli Song ; Dacheng Tao ; Zicheng Liu ; Luming Zhang ; Chun Chen ; Jiajun Bu
Author_Institution :
Coll. of Comput. Sci., Zhejiang Univ., Hangzhou, China
Volume :
24
Issue :
1
fYear :
2015
fDate :
Jan. 2015
Firstpage :
471
Lastpage :
483
Abstract :
Random forest (RF) is a very important classifier with applications in various machine learning tasks, but its promising performance heavily relies on the size of labeled training data. In this paper, we investigate constructing of RFs with a small size of labeled data and find that the performance bottleneck is located in the node splitting procedures; hence, existing solutions fail to properly partition the feature space if there are insufficient training data. To achieve robust node splitting with insufficient data, we present semisupervised splitting to overcome this limitation by splitting nodes with the guidance of both labeled and abundant unlabeled data. In particular, an accurate quality measure of node splitting is obtained by carrying out the kernel-based density estimation, whereby a multiclass version of asymptotic mean integrated squared error criterion is proposed to adaptively select the optimal bandwidth of the kernel. To avoid the curse of dimensionality, we project the data points from the original high-dimensional feature space onto a low-dimensional subspace before estimation. A unified optimization framework is proposed to select a coupled pair of subspace and separating hyperplane such that the smoothness of the subspace and the quality of the splitting are guaranteed simultaneously. Our algorithm efficiently avoids overfitting caused by bad initialization and local maxima when compared with conventional margin maximization-based semisupervised methods. We demonstrate the effectiveness of the proposed algorithm by comparing it with state-of-the-art supervised and semisupervised algorithms for typical computer vision applications, such as object categorization, face recognition, and image segmentation, on publicly available data sets.
Keywords :
computer vision; image classification; learning (artificial intelligence); optimisation; RF; asymptotic mean integrated squared error criterion; computer vision; data points; high-dimensional feature space; hyperplane separation; kernel-based density estimation; labeled training data; local maxima; low-dimensional subspace; machine learning tasks; margin maximization-based semisupervised methods; quality measure; random forest construction; robust semisupervised node splitting; supervised algorithms; unified optimization framework; Bandwidth; Covariance matrices; Estimation; Kernel; Radio frequency; Training; Training data; Semi-supervised splitting; node splitting; random forest; semi-supervised learning; semi-supervised splitting; subspace learning;
fLanguage :
English
Journal_Title :
Image Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1057-7149
Type :
jour
DOI :
10.1109/TIP.2014.2378017
Filename :
6975199
Link To Document :
بازگشت