Title :
VLIW - SIMD processor based scalable architecure for parallel classifier node computing
Author :
Puppala, Venkata Ganapathi
Author_Institution :
HSG ASIC Group, QLogic India Pvt Ltd., Pune, India
Abstract :
This paper presents a VLIW-SIMD processor based scalable architecture with Data Flow Control (DC) Engine and Classifier Evaluation (CE) Engine for parallel classifier node computation to accelerate the object detection algorithm for embedded applications. The popular algorithm is developed by Viola and Jones [1] with Haar - like features. The architecture has 4-slot very long instruction word (VLIW) processor core and internal memories to hold the integral image data and classifier data. Each VLIW instruction packet has two load/store instruction slots and two 4-way SIMD instruction slots. Generic SIMD instructions are added to the instruction set to compute various classifier parameters in parallel. Nodes in two levels of classifier tree are computed in parallel with the proposed instructions. Fixed point arithmetic is to get faster clock rates at less area. The performance of the architecture is tested using the training data from OpenCV to detect the frontal faces from a set of images. Single instance of the proposed architecture is able detect the faces from CIF resolution images at a rate of 8.33 fps running at 500MHz clock frequency which is 1.6X performance gain over the OpenCV software version running on Pentimum-4, 2GHz processor. Two instances of classifier evaluation engine gives performance of 15.47 fps. The proposed engine is designed using Verilog HDL and it is synthesized using Synopsis Design Compiler with 28nm TSMC target libraries. The clock period is set to 2ns and the timing constraints are met.
Keywords :
Haar transforms; data flow computing; face recognition; fixed point arithmetic; image classification; image resolution; instruction sets; multiprocessing systems; object detection; parallel architectures; parallel machines; program compilers; trees (mathematics); 4-slot very long instruction word processor core; 4-way SIMD instruction slot; CIF resolution image; Haar-like feature; OpenCV software version; Synopsis Design Compiler; TSMC target library; VLIW instruction packet; VLIW-SIMD processor based scalable architecture; Verilog HDL; architecture performance testing; classifier data; classifier evaluation engine; classifier parameter; classifier tree; clock period; clock rate; data flow control engine; embedded application; fixed point arithmetic; frontal face detection; instruction set; integral image data; internal memory; load-store instruction slot; object detection algorithm; parallel classifier node computing; performance gain; scalable architecure; timing constraints; Computer architecture; Engines; Object detection; Random access memory; Registers; VLIW; Vectors; OpenCV; SIMD instructions; VLIW; computer vision; face detection; object detection;
Conference_Titel :
Advance Computing Conference (IACC), 2013 IEEE 3rd International
Conference_Location :
Ghaziabad
Print_ISBN :
978-1-4673-4527-9
DOI :
10.1109/IAdCC.2013.6514448