FloatX
Citations Over TimeTop 10% of 2019 papers
Abstract
We present FloatX (Float eXtended), a C ++ framework to investigate the effect of leveraging customized floating-point formats in numerical applications. FloatX formats are based on binary IEEE 754 with smaller significand and exponent bit counts specified by the user. Among other properties, FloatX facilitates an incremental transformation of the code, relies on hardware-supported floating-point types as back-end to preserve efficiency, and incurs no storage overhead. The article discusses in detail the design principles, programming interface, and datatype casting rules behind FloatX. Furthermore, it demonstrates FloatX’s usage and benefits via several case studies from well-known numerical dense linear algebra libraries, such as BLAS and LAPACK; the Ginkgo library for sparse linear systems; and two neural network applications related with image processing and text recognition.
Related Papers
- A Calculation Model for the Time Float of Probabilistic Network Models(2010)
- → Gert Analysis of Optimal Maintenance Float System(1997)1 cited
- → Research on Building Construction with Application and Analysis of Total Float in the Activity-on-Arrow Network(2014)1 cited
- Research on Delay Claims Based on Total Float Allocation(2010)
- The Problem of Operating Float Number and the Method of Solving It(2002)