• DocumentCode
    119521
  • Title

    iCHAT: Inter-cache Hardware-Assistant Data Transfer for Heterogeneous Chip Multiprocessors

  • Author

    Junli Gu ; Beckmann, Bradford M. ; Ting Cao ; Yu Hu

  • fYear
    2014
  • fDate
    6-8 Aug. 2014
  • Firstpage
    242
  • Lastpage
    251
  • Abstract
    Modern heterogeneous multiprocessors integrate CPU and GPU together to provide a boost to computational performance. Data sharing and communication between CPU and GPU has been a critical issue for the final speedup. With tighter integration of CPU and GPU, it has the advantage of sharing and moving data more efficiently in order to leverage the computational power that a GPU can provide. Initially, DMA or PCIe devices were used to transfer data between CPU and GPU with low efficiency and little flexibility. Recently a single address space and coherent cache hierarchies are being adopted in heterogeneous architectures to share data more efficiently. Thus it poses new challenge to understand the communication overheads in this new context and to improve communication efficiencies for these architectures. This paper proposes a novel approach called iCHAT (inter-Cache Hardware-Assistant data Transfer) to manage data transfer between the CPU cache and the GPU cache efficiently. The iCHAT technique proposed in this paper detects the communication patterns and eagerly evicts data from the owner´s caches and prepares for the requestor´s demand. We implement the iCHAT design in a simulator based on gem5 and an AMD in-house GPU simulator. Experimental results show that the communication related eviction traffic is reduced by an average of 40% and the total directory traffic is reduced by 8% on average. We implement a bounding experiment that provides a quantitative evaluation of inter CPU-GPU transfers and requests to communication data, which indicates that iCHAT could achieve on average 1.4x speedup for Rodinia benchmark suite and 1.2x speedup for AMD SDK APPs.
  • Keywords
    cache storage; graphics processing units; multiprocessing systems; AMD in-house GPU simulator; CPU; GPU; address space; cache hierarchy; central processing unit; communication efficiency; communication related eviction traffic; computational performance; data sharing; gem5; graphics processing unit; heterogeneous chip multiprocessors; iCHAT technique; inter-cache hardware-assistant data transfer; Benchmark testing; Coherence; Computer architecture; Data transfer; Detectors; Graphics processing units; Hardware;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Networking, Architecture, and Storage (NAS), 2014 9th IEEE International Conference on
  • Conference_Location
    Tianjin
  • Type

    conf

  • DOI
    10.1109/NAS.2014.43
  • Filename
    6923186