UMH
ACM Transactions on Architecture and Code Optimization2016Vol. 13(4), pp. 1–25
Citations Over TimeTop 10% of 2016 papers
Amir Kavyan Ziabari, Yifan Sun, Yenai Ma, Dana Schaa, José Luis Abellán, Rafael Ubal, John Kim, Ajay Joshi, David Kaeli
Abstract
In this article, we describe how to ease memory management between a Central Processing Unit (CPU) and one or multiple discrete Graphic Processing Units (GPUs) by architecting a novel hardware-based Unified Memory Hierarchy (UMH). Adopting UMH, a GPU accesses the CPU memory only if it does not find its required data in the directories associated with its high-bandwidth memory, or the NMOESI coherency protocol limits the access to that data. Using UMH with NMOESI improves performance of a CPU-multiGPU system by at least 1.92 × in comparison to alternative software-based approaches. It also allows the CPU to access GPUs modified data by at least 13 × faster.
Related Papers
- → Basker: A Threaded Sparse LU Factorization Utilizing Hierarchical Parallelism and Data Layouts(2016)12 cited
- Memory hierarchy exploration for accelerating the parallel computation of SVDs(2008)
- → Avoiding Communication through a Multilevel LU Factorization(2012)2 cited
- → Basker: A Threaded Sparse LU Factorization Utilizing Hierarchical Parallelism and Data Layouts(2016)