Abstract :
Current application of GPU processors for parallel computing tasks show excellent results in terms of speed-ups compared to CPU processors. However, there is no existing framework that enables automatic distribution of data and processing across multiple GPUs, modularity of kernel design, and efficient co-usage of CPU and GPU processors. All these elements are necessary conditions to enable users to easily perform ´Big Data´ analysis, and to create their own modules for their desired processing functionality. We propose a framework for in-memory ´Big Text Data´ analytics that provides mechanisms for automatic data segmentation, distribution, execution, and result retrieval across multiple cards (CPU, GPU & FPGA) and machines, and a modular design for easy addition of new GPU kernels. The architecture and components of the framework such as multi-card data distribution and execution, data structures for efficient memory access, algorithms for parallel GPU computation, and result retrieval are described in detail, and some of the kernels in the framework are evaluated using Big Data versus multi-core CPUs to demonstrate the performance and feasibility of using it for ´Big Data´ analytics, providing alternative and cheaper HPC solution.
Keywords :
data analysis; graphics processing units; multiprocessing systems; parallel processing; storage management; text analysis; CPU processor; FPGA; GPU kernel; GPU processor; automatic data distribution; automatic data execution; automatic data segmentation; data processing; data structure; framework architecture; framework component; in-memory big text data analysis; kernel design; memory access; modular design; multiGPU framework; multicard data distribution; multicard data execution; multicore CPU; parallel GPU computation; parallel computing task; speed-up; text processing; Computer architecture; Data mining; Databases; Graphics processing units; Kernel; Libraries; GPU; framework; in-memory; matching; sorting; text analysis; text processing;