0 citations0 references

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

2018pp. 2704–2713

Citations Over TimeTop 10% of 2018 papers

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew F. Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko

Abstract

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.

Citations Over TimeTop 10% of 2018 papers

Abstract

Related Papers