Radix-8 Digit-by-Rounding: Achieving High-Performance Reciprocals, Square Roots, and Reciprocal Square Roots
Citations Over Time
Abstract
We describe a high-performance digit-recurrence algorithm for computing exactly rounded reciprocals, square roots, and reciprocal square roots in hardware at a rate of three result bits - one radix-8 digit - per recurrence iteration. To achieve a single-cycle recurrence at a short cycle time, we adapted the digit-by-rounding algorithm, which is normally applied at much higher radices, for efficient operation at radix 8. Using this approach avoids in the recurrence step the lookup table required by SRT - the usual algorithm used for hardware digit recurrences. The increasing access latency of this table, the size of which grows super linearly in the radix, limits high-frequency SRT implementations to radix 4 or lower. We also developed a series of novel optimizations focused on further reducing the critical path through the recurrence. We propose, for example, decreasing data path widths to a point where erroneous results sometimes occur and then correcting these errors off the critical path. We present a specific implementation that computes any of these functions to 31 bits of precision in 13 cycles. Our implementation achieves a cycle time only 11% longer than the best reported SRT design for the same functions, yet delivers results in five fewer cycles. Finally, we show that even at lower radices, a digit-by-rounding design is likely to have a shorter critical path than one using SRT at the same radix.
Related Papers
- → Very-high radix combined division and square root with prescaling and selection by rounding(2002)27 cited
- → Complex Square Root with Operand Prescaling(2007)10 cited
- → Complex square root with operand prescaling(2004)9 cited
- Rounding of Floating Point Intervals(1994)
- → Children’s mixed-rounding strategy use in computational estimation(2022)