• DocumentCode
    186371
  • Title

    Graph processing on GPUs: Where are the bottlenecks?

  • Author

    Qiumin Xu ; Hyeran Jeon ; Annavaram, Murali

  • Author_Institution
    Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
  • fYear
    2014
  • fDate
    26-28 Oct. 2014
  • Firstpage
    140
  • Lastpage
    149
  • Abstract
    Large graph processing is now a critical component of many data analytics. Graph processing is used from social networking Web sites that provide context-aware services from user connectivity data to medical informatics that diagnose a disease from a given set of symptoms. Graph processing has several inherently parallel computation steps interspersed with synchronization needs. Graphics processing units (GPUs) are being proposed as a power-efficient choice for exploiting the inherent parallelism. There have been several efforts to efficiently map graph applications to GPUs. However, there have not been many characterization studies that provide an in-depth understanding of the interaction between the GPGPU hardware components and graph applications that are mapped to execute on GPUs. In this study, we compiled 12 graph applications and collected the performance and utilization statistics of the core components of GPU while running the applications on both a cycle accurate simulator and a real GPU card. We present detailed application execution characteristics on GPUs. Then, we discuss and suggest several approaches to optimize GPU hardware for enhancing the graph application performance.
  • Keywords
    data analysis; graph theory; graphics processing units; mathematics computing; parallel processing; synchronisation; GPGPU hardware components; GPU core components; GPU hardware optimization; application execution characteristics; cycle accurate simulator; data analytics; graph application mapping; graph application performance enhancement; graphics processing units; large-graph processing; parallel computation; parallelism; performance statistics; power-efficiency; real GPU card; synchronization; utilization statistics; Computational modeling; Graphics processing units; Hardware; Instruction sets; Kernel; Pipelines; Synchronization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Workload Characterization (IISWC), 2014 IEEE International Symposium on
  • Conference_Location
    Raleigh, NC
  • Print_ISBN
    978-1-4799-6452-9
  • Type

    conf

  • DOI
    10.1109/IISWC.2014.6983053
  • Filename
    6983053