• DocumentCode
    79469
  • Title

    A Practical Data Classification Framework for Scalable and High Performance Chip-Multiprocessors

  • Author

    Yong Li ; Melhem, Rami ; Jones, Alex K.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Pittsburgh, Pittsburgh, PA, USA
  • Volume
    63
  • Issue
    12
  • fYear
    2014
  • fDate
    Dec. 2014
  • Firstpage
    2905
  • Lastpage
    2918
  • Abstract
    State-of-the-art chip multiprocessor (CMP) proposals emphasize general optimizations designed to deliver computing power for many types of applications. Potentially, significant performance improvements that leverage application-specific characteristics such as data access behavior are missed by this approach. In this paper, we demonstrate how scalable and high-performance parallel systems can be built by classifying data accesses into different categories and treating them differently. We develop a novel compiler-based approach to speculatively detect a data classification termed practically private, which we demonstrate is ubiquitous in a wide range of parallel applications. Leveraging this classification provides efficient solutions to mitigate data access latency and coherence overhead in today´s many-core architectures. While the proposed data classification scheme can be applied to many micro-architectural constructs including the TLB, coherence directory, and interconnect, we demonstrate its potential through an efficient cache coherence design. Specifically, we show that the compiler-assisted mechanism reduces an average of 46% coherence traffic and achieves up to 12%, 8%, and 5% performance improvement over shared, private, and state-of-the-art NUCA-based caching, respectively, depending on scenarios.
  • Keywords
    cache storage; parallel architectures; pattern classification; performance evaluation; program compilers; ubiquitous computing; NUCA-based caching; TLB; application-specific characteristics; cache coherence design; chip multiprocessor; coherence directory; coherence overhead mitigation; coherence traffic; compiler-assisted mechanism; data access behavior; data access latency mitigation; data classification scheme; interconnect; many-core architectures; microarchitectural constructs; parallel applications; performance improvement; practically private; scalable high-performance parallel systems; Benchmark testing; Coherence; Dynamic scheduling; Instruction sets; Optimization; Resource management; Runtime; OpenMP; Practically private; cache coherence; compilers; data classification; multi-threaded parallel; pipelined parallel;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2013.161
  • Filename
    6577381