Register organization for media processing
Citations Over TimeTop 1% of 2002 papers
Abstract
Processor architectures with tens to hundreds of arithmetic units are emerging to handle media processing applications. These applications, such as image coding, image synthesis and image understanding, require arithmetic rates of up to 10/sup 11/ operations per second. As the number of arithmetic units in a processor increases to meet these demands, register storage and communication between the arithmetic units dominate the area, delay and power of the arithmetic units. In this paper, we show that partitioning the register file along three axes reduces the cost of register storage and communication without significantly impacting performance. We develop a taxonomy of register architectures by partitioning across the data-parallel, instruction-level-parallel and memory-hierarchy axes, and by optimizing the hierarchical register organization for operation on streams of data. Compared to a centralized global register file, the most compact of these organizations reduces the register file area, delay and power dissipation of a media processor by factors of 195, 230 and 430 respectively. This reduction in cost is achieved with a performance degradation of only 8% on a representative set of media processing benchmarks.
Related Papers
- GPGPU Register File Management by Hardware Co-operated Register Reallocation(2014)
- → PALF: compiler supports for irregular register files in clustered VLIW DSP processors(2007)15 cited
- Register Allocation for VLIW DSP Processors with Irregular Register Files(2006)
- → Compiler-Based Register Name Adjustment for Low-Power Embedded Processors(2003)12 cited
- → An Efficient Register Renaming Technique with Delayed Allocation and Register Packing(2020)