Title :
Performance Bottlenecks in Manycore Systems: A Case Study on Large Scale Feature Matching within Image Collections
Author :
Xiaoxin Tang ; Mills, Steven ; Eyers, David ; Kai-Cheung Leung ; Zhiyi Huang ; Minyi Guo
Author_Institution :
Dept. of Comput. Sci., Shanghai Jiao Tong Univ., Shanghai, China
Abstract :
In memory-intensive algorithms, the problem size is often so large that it cannot fit into the cache of a CPU, and this may result in an excessive number of cache misses, a bottleneck that can easily make seemingly embarrassingly-parallel algorithms such as feature-matching unscalable in many core systems. To solve this bottleneck, this paper proposes a general Divide-and-Merge methodology, which divides the feature space into several small sub-spaces, so that the shared resources in each sub-space can be satisfied without causing bottlenecks. Experimental results have shown that the Divide-and-Merge methodology reduces the L3 cache misses and time spent on memory-allocation-related system calls, resulting in a 211% performance improvement on an AMD 64-core CPU machine, and 57% and 16% performance improvements on AMD and Intel 16-core machines respectively. Performance results on a modern GPU also show that a well-tuned algorithm with time complexity of O(F^2) is able to defeat a state-of-the-art O(F^1.5) algorithm by 188% for our real-world dataset, which again highlights the huge performance impact of the memory system.
Keywords :
feature extraction; image matching; multiprocessing systems; performance evaluation; AMD 64-core CPU machine; Intel 16-core machines; cache misses; divide-and-merge methodology; embarrassingly-parallel algorithms; feature matching; image collections; manycore systems; memory allocation-related system calls; memory system; memory-intensive algorithms; performance bottlenecks; performance improvements; subspace; time complexity; well-tuned algorithm; Approximation algorithms; Graphics processing units; Indexes; Libraries; Parallel processing; Random access memory; Time complexity; Divide-and-Merge; Feature matching; Memory wall; Parallel computing;
Conference_Titel :
High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), 2013 IEEE 10th International Conference on
Conference_Location :
Zhangjiajie
DOI :
10.1109/HPCC.and.EUC.2013.140