Title :
Towards Optimal Task Distribution on Computer Clusters with Intel MIC Coprocessors
Author :
Chenggang Lai;Miaoqing Huang;Genlang Chen
Author_Institution :
Dept. of Comput. Sci. &
Abstract :
Computer clusters with coprocessors/accelerators are typically leveraged to parallelize applications for reducing computation time. Given N parallel tasks and M processing cores, the typical strategy is to statically distribute those N tasks among M cores so that each core receives N/M tasks. However, for many sophisticated applications, the processing times of N tasks may vary. In other words, some tasks will take longer time than others. The static distribution will cause the cores with light tasks to wait for the cores with the heavy tasks, resulting in an imbalance in task distribution and the nonminimal overall processing time for the application. In this work we apply dynamic task distribution. All the unfinished tasks form a task pool. Once a core finishes a task, it will request a new task from the task pool. Through this manner, all cores will be kept busy in the whole computation process. We apply two additional optimization techniques to further improve the performance of applications on clusters with Intel MIC coprocessors. First, we design hybrid implementations to distribute tasks to both the CPUs and MICs. Second, we apply multiple-level parallelism technique to realize the concurrency among the N tasks as well as the concurrency in each task. We use sparse coding as a case study to demonstrate the advantages of our approach. Sparse coding is a class of unsupervised methods for learning sets of over-complete bases to represent data efficiently. The aim of sparse coding is to find a set of basis vectors such that an input vector can be represented as a linear combination of these basis vectors. The results show that the dynamic task distribution can improve the performance by 25% compared with the static one. Further, the hybrid mode implementation involving both the host CPUs and the MICs can outperform the basic offload mode implementation by 40%.
Keywords :
"Microwave integrated circuits","Parallel processing","Encoding","Coprocessors","Dictionaries","Instruction sets","Computers"
Conference_Titel :
High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on
DOI :
10.1109/HPCC-CSS-ICESS.2015.40