UPC++: A High-Performance Communication Framework for Asynchronous Computation
Citations Over TimeTop 10% of 2019 papers
Abstract
UPC++ is a C++ library that supports high-performance computation via an asynchronous communication framework. This paper describes a new incarnation that differs substantially from its predecessor, and we discuss the reasons for our design decisions. We present new design features, including future-based asynchrony management, distributed objects, and generalized Remote Procedure Call (RPC). We show microbenchmark performance results demonstrating that one-sided Remote Memory Access (RMA) in UPC++ is competitive with MPI-3 RMA; on a Cray XC40 UPC++ delivers up to a 25% improvement in the latency of blocking RMA put, and up to a 33% bandwidth improvement in an RMA throughput test. We showcase the benefits of UPC++ with irregular applications through a pair of application motifs, a distributed hash table and a sparse solver component. Our distributed hash table in UPC++ delivers near-linear weak scaling up to 34816 cores of a Cray XC40. Our UPC++ implementation of the sparse solver component shows robust strong scaling up to 2048 cores, where it outperforms variants communicating using MPI by up to 3.1x. UPC++ encourages the use of aggressive asynchrony in low overhead RMA and RPC, improving programmer productivity and delivering high performance in irregular applications.
Related Papers
- → InfiniBand Verbs Optimizations for Remote GPU Virtualization(2015)3 cited
- → Tuning remote GPU virtualization for InfiniBand networks(2016)6 cited
- → InfiniBand & OpenFabrics---InfiniBand and OpenFabrics at SC06(2006)1 cited
- Research of the Switching Based on InfiniBand(2004)
- → Modelling and simulation of IP over InfiniBand with OMNeT++(2017)