Triton: an intermediate language and compiler for tiled neural network computations
2019pp. 10–19
Citations Over TimeTop 10% of 2019 papers
Abstract
The validation and deployment of novel research ideas in the field of Deep Learning is often limited by the availability of efficient compute kernels for certain basic primitives. In particular, operations that cannot leverage existing vendor libraries (e.g., cuBLAS, cuDNN) are at risk of facing poor device utilization unless custom implementations are written by experts – usually at the expense of portability. For this reason, the development of new programming abstractions for specifying custom Deep Learning workloads at a minimal performance cost has become crucial.
Related Papers
- → Algorithm and Software Overhead: A Theoretical Approach to Performance Portability(2023)1 cited
- → OS Subset Structure Achieving AP Portability(1990)4 cited
- → The pitfalls of portability . . or . . why more is not better(2002)3 cited
- Portability of Parallel and Distributed Applications: Is it possible to build a portable and scalable parallel application?(2009)
- → Standardization -portability of crystallographic data(1993)