Author_Institution :
Center for Supercomput. Res. & Dev., Illinois Univ., Urbana, IL, USA
Abstract :
Many future applications for scalable shared-memory multiprocessors are likely to have large working sets that overflow secondary or tertiary caches. Two possible solutions to this problem are to add a very large cache called remote cache that caches remote data (NUMA-RC), or organize the machine as a cache-only memory architecture (COMA). This paper tries to determine which solution is best. To compare the performance of the two organizations for the same amount of total memory, we introduce a model of data sharing. The model uses three data sharing patterns: replication, read-mostly migration, and read-writs migration. Replication data is accessed in read-mostly mode by several processors, while migration data is accessed largely by one processor at a time. For large working sets, the weight of the migration data largely determines whether COMA outperforms NUMA-RC. Ideally, COMA only needs to fit the replication data in its extra memory; the migration data will simply be swapped between attraction memories. The remote cache of NUMA-RC, instead, needs to house both the replication and the migration data. However, simulations of seven Splash2 applications show that COMA does not outperform NUMA-RC. This is due to two reasons. First, the extra memory added has more associativity in NUMA-RC than in COMA and, therefore, can be utilized better by the working set in NUMA-RC. Second, COMA memory accesses are more expensive. Of course, our results are affected by the applications used, which have been optimized for a cache-coherent NUMA machine. Overall, since NUMA-RC is cheaper, NUMA-RC is more cost-effective for these applications
Keywords :
cache storage; memory architecture; performance evaluation; shared memory systems; COMA; NUMA; NUMA-RC; Splash2 applications; cache-only memory architecture; data sharing; read-mostly migration; read-writs migration; remote cache; remote conflict misses reduction; replication; scalable shared-memory multiprocessors; Contracts; Costs; Memory architecture; NASA; Random access memory; Research and development;