Exploring Data Migration for Future Deep-Memory Many-Core Systems

Abstract

Upcoming high-performance computing (HPC) platforms will have more complex memory hierarchies with high-bandwidth on-package memory and in the future also non-volatile memory. How to use such deep memory hierarchies effectively remains an open research question. In this paper we evaluate the performance implications of a scheme based on a software-managed scratchpad with coarse-grained memory-copy operations migrating application data structures between memory hierarchy levels. We expect that such a scheme can, under specificcircumstances, outperform a hardware-managed cache while requiring a lot less effort than would a scheme managed entirely by the application programmers. Because suitable hardware is not yet generally available, we propose and benchmark several existing hardware configurations that can be used as approximations, including non-uniform memory access (NUMA) systems and memory on accelerators. We then evaluate data migration mechanisms currently available on Linux systems, such as move_pages and memcpy. We also design a best-case-scenario HPC benchmark to explore how the memory locality and parallelism of applications can be improved by data migration. We find that NUMA systems can be a reasonable approximation platform, especially when auxiliary load mechanisms are employed. Memory migration mechanisms inside the Linux kernel turn out to significantly lag behind a plain user-space memory copy, even after we level the playing field as much as possible. Our dedicated application benchmark demonstrates a significant performance benefit of doing memory migrations-approaching the measured difference in the memory bandwidth-provided that the ratio of worker threads to migration threads is chosen well.

Exploring Data Migration for Future Deep-Memory Many-Core Systems

Citations Over TimeTop 11% of 2016 papers

Abstract

Related Papers