Title : 
Designing scalable many-core parallel algorithms for min graphs using CUDA
         
        
        
            Author_Institution : 
Lamar Univ., Beaumont, TX, USA
         
        
        
        
        
        
            Abstract : 
Removing redundant edges on a large graph is a fundamental problem in many practical applications such as verification of real-time systems and network routing. In this paper, we present the designs of scalable and efficient parallel algorithms for multiple many-core GPU devices using CUDA. Our algorithms expose substantial fine-grained parallelism while maintaining minimal global communication. By using the global scope of the GPU´s global memory, coalescing the global memory reads and writes, and avoiding on-chip shared memory bank conflicts, we are able to achieve a large performance benefit with a speed-up of 2,500x on a desktop computer in comparison with a single core CPU program. We report our experiments on large graphs with up to 29 K vertices using multiple GPU devices.
         
        
            Keywords : 
computer graphic equipment; coprocessors; graph theory; multiprocessing systems; parallel algorithms; parallel architectures; CUDA; GPU global memory; compute unified device architecture; fine-grained parallelism; min graphs; minimal global communication; multiple many-core GPU devices; redundant edges; scalable many-core parallel algorithm; Algorithm design and analysis; Concurrent computing; Global communication; Parallel algorithms; Parallel processing; Performance gain; Read-write memory; Real time systems; Routing; Yarn; Multi-core Computing; Programming Techniques;
         
        
        
        
            Conference_Titel : 
Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on
         
        
            Conference_Location : 
Atlanta, GA
         
        
            Print_ISBN : 
978-1-4244-6533-0
         
        
        
            DOI : 
10.1109/IPDPSW.2010.5470712