An instruction set and microarchitecture for instruction level distributed processing
Citations Over TimeTop 15% of 2003 papers
Abstract
An instruction set architecture (ISA) suitable for future microprocessor design constraints is proposed. The ISA has hierarchical register files with a small number of accumulators at the top. The instruction stream is divided into chains of dependent instructions (strands) where intra-strand dependences are passed through the accumulator. The general-purpose register file is used for communication between strands and for holding global values that have many consumers. A microarchitecture to support the proposed ISA is proposed and evaluated. The microarchitecture consists of multiple, distributed processing elements. Each PE contains an instruction issue FIFO, a local register (accumulator) and local copy! of register file. The overall simplicity, hierarchical value communication, and distributed implementation will provide a very high clock speed and a relatively short pipeline while maintaining a form of superscalar out-of-order execution. Detailed timing simulations using translated program traces show the proposed microarchitecture is tolerant of global wire latencies. Ignoring the significant clock frequency advantages, a microarchitecture that supports a 4-wide fetch/decode pipeline, 8 serial PEs, and a two-cycle inter-PE communication latency performs as well as a conventional 4-way out-of-order superscalar processor.
Related Papers
- → Power-aware branch prediction techniques(2004)14 cited
- → Evaluating the Design of a VLIW Processor for Real-Time Systems(2016)4 cited
- → Branch history table indexing to prevent pipeline bubbles in wide-issue superscalar processors(1993)13 cited
- → Investigating a four-issue deterministic VLIW architecture for real-time systems(2015)2 cited
- → Dynamically allocating processor resources between nearby and distant ILP(2001)