Title :
Building diverse and optimized ensembles of gradient boosted trees for high-dimensional data
Author :
Abdunabi, Tarek ; Basir, Otman
Author_Institution :
Electr. & Comput. Eng. Dept., Univ. of Waterloo, Waterloo, ON, Canada
Abstract :
Gradient Boosting Machines (GBMs) are powerful ensemble learning techniques that have been successfully applied to several low-dimensional applications. In GBMs, the learning algorithm sequentially fits new models to provide more accurate prediction of the response variable. Despite their high accuracy, GBMs suffer from major drawbacks such as high memory-consumption. In addition, given the fact that the learning algorithm is essentially sequential, it has problems with parallelization by design. Therefore, building optimized GBMs for high-dimensional applications requires powerful computations resources. In this paper, using real, high-dimensional (i.e. 1776 predictors) dataset, we demonstrate that by using different features selection/reduction techniques, the computations costs for building and tuning Tree-based GBMs can be substantially reduced at a slight drop in prediction accuracy. To cope with the data-intensive computations involved in building and tuning the ensembles, we utilize Amazon Elastic Compute Cloud (EC2) web service.
Keywords :
Web services; cloud computing; data reduction; feature selection; gradient methods; learning (artificial intelligence); trees (mathematics); Amazon Elastic Compute Cloud; EC2 Web service; GBM; data reduction; data-intensive computation; ensemble learning technique; feature selection; gradient boosted tree; gradient boosting machine; high-dimensional data; learning algorithm; Accuracy; Biology; Operating systems; Radio frequency; Sensitivity; Tuning; Cloud computing; Ensemble learning; High-dimensional data; Predictive modeling;
Conference_Titel :
Cloud Computing and Intelligence Systems (CCIS), 2014 IEEE 3rd International Conference on
Print_ISBN :
978-1-4799-4720-1
DOI :
10.1109/CCIS.2014.7175758