You Cannot Improve What You Do not Measure
Citations Over TimeTop 10% of 2018 papers
Abstract
Recently, deep learning (DL) has become best-in-class for numerous applications but at a high computational cost that necessitates high-performance energy-efficient acceleration. The reconfigurability of FPGAs is appealing due to the rapid change in DL models but also causes lower performance and area-efficiency compared to ASICs. In this article, we implement three state-of-the-art computing architectures (CAs) for convolutional neural network (CNN) inference on FPGAs and ASICs. By comparing the FPGA and ASIC implementations, we highlight the area and performance costs of programmability to pinpoint the inefficiencies in current FPGA architectures. We perform our experiments using three variations of these CAs for AlexNet, VGG-16 and ResNet-50 to allow extensive comparisons. We find that the performance gap varies significantly from 2.8× to 6.3×, while the area gap is consistent across CAs with an 8.7 average FPGA-to-ASIC area ratio. Among different blocks of the CAs, the convolution engine, constituting up to 60% of the total area, has a high area ratio ranging from 13 to 31. Motivated by our FPGA vs. ASIC comparisons, we suggest FPGA architectural changes such as increasing DSP block count, enhancing low-precision support in DSP blocks and rethinking the on-chip memories to reduce the programmability gap for DL applications.
Related Papers
- → A Highly Compatible Architecture Design for Optimum FPGA to Structured-ASIC Migration(2006)7 cited
- → Coarse-grain reconfigurable ASIC through multiplexer based switches(2015)
- 量産はFPGAを使うのか、ASICを起こすのか : ユーザーが見るASICの課題とFPGAの使い方(チュートリアル:FPGAはここまで来た!)(2003)
- The Application of FPGA Devices in ASIC Design(2001)
- → SOC Design Synthesis and Implementation(2020)