Title :
Accelerating a Random Forest Classifier: Multi-Core, GP-GPU, or FPGA?
Author :
Van Essen, Brian ; Macaraeg, Chris ; Gokhale, Maya ; Prenger, Ryan
Author_Institution :
Lawrence Livermore Nat. Lab., Livermore, CA, USA
fDate :
April 29 2012-May 1 2012
Abstract :
Random forest classification is a well known machine learning technique that generates classifiers in the form of an ensemble ("forest") of decision trees. The classification of an input sample is determined by the majority classification by the ensemble. Traditional random forest classifiers can be highly effective, but classification using a random forest is memory bound and not typically suitable for acceleration using FPGAs or GP-GPUs due to the need to traverse large, possibly irregular decision trees. Recent work at Lawrence Livermore National Laboratory has developed several variants of random forest classifiers, including the Compact Random Forest (CRF), that can generate decision trees more suitable for acceleration than traditional decision trees. Our paper compares and contrasts the effectiveness of FPGAs, GP-GPUs, and multi-core CPUs for accelerating classification using models generated by compact random forest machine learning classifiers. Taking advantage of training algorithms that can produce compact random forests composed of many, small trees rather than fewer, deep trees, we are able to regularize the forest such that the classification of any sample takes a deterministic amount of time. This optimization then allows us to execute the classifier in a pipelined or single-instruction multiple thread (SIMT) fashion. We show that FPGAs provide the highest performance solution, but require a multi-chip / multi-board system to execute even modest sized forests. GP-GPUs offer a more flexible solution with reasonably high performance that scales with forest size. Finally, multi-threading via Open MP on a shared memory system was the simplest solution and provided near linear performance that scaled with core count, but was still significantly slower than the GP-GPU and FPGA.
Keywords :
decision trees; electronic engineering computing; field programmable gate arrays; learning (artificial intelligence); microprocessor chips; multi-threading; pattern classification; shared memory systems; CRF; FPGA; GP-GPU; SIMT; compact random forest; decision tree; ensemble; forest size; linear performance; machine learning; majority classification; memory bound; multiboard system; multichip system; multicore CPU; multithreading; open MP; random forest classifier; shared memory system; single-instruction multiple thread; training algorithm; Acceleration; Decision trees; Field programmable gate arrays; Hardware; Pipelines; Training; Vegetation; FPGA; GP-GPU; Machine learning; OpenMP;
Conference_Titel :
Field-Programmable Custom Computing Machines (FCCM), 2012 IEEE 20th Annual International Symposium on
Conference_Location :
Toronto, ON
Print_ISBN :
978-1-4673-1605-7
DOI :
10.1109/FCCM.2012.47