Fast Address Translation Techniques for Distributed Shared Memory Compilers
Citations Over TimeTop 10% of 2005 papers
Abstract
The distributed shared memory (DSM) model is designed to leverage the ease of programming of the shared memory paradigm, while enabling the high-performance by expressing locality as in the message-passing model. Experience, however, has shown that DSM programming languages, such as UPC, may be unable to deliver the expected high level of performance. Initial investigations have shown that among the major reasons is the overhead of translating from the UPC memory model to the target architecture virtual addresses space, which can be very costly. Experimental measurements have shown this overhead increasing execution time by up to three orders of magnitude. Previous work has also shown that some of this overhead can be avoided by hand-tuning, which on the other hand can significantly decrease the UPC ease of use. In addition, such tuning can only improve the performance of local shared accesses but not remote shared accesses. Therefore, a new technique that resembles the translation look aside buffers (TLBs) is proposed here. This technique, which is called the memory model translation buffer (MMTB) has been implemented in the GCC-UPC compiler using two alternative strategies, full-table (FT) and reduced-table (RT). It would be shown that the MMTB strategies can lead to a performance boost of up to 700%, enabling ease-of-programming while performing at a similar performance to hand-tuned UPC and MPI codes.
Related Papers
- → Rebooting Virtual Memory with Midgard(2021)23 cited
- → Architectural support for translation table management in large address space machines(1993)10 cited
- → Improving the performance and energy-efficiency of virtual memory(2016)1 cited
- Research on re-redirection of overlapped address spaces(2010)
- → Page-Address Coalescing of Vector Gather Instructions for Efficient Address Translation(2022)