BitFlow: Exploiting Vector Parallelism for Binary Neural Networks on CPU
Citations Over TimeTop 10% of 2018 papers
Abstract
Deep learning has revolutionized computer vision and other fields since its big bang in 2012. However, it is challenging to deploy Deep Neural Networks (DNNs) into real-world applications due to their high computational complexity. Binary Neural Networks (BNNs) dramatically reduce computational complexity by replacing most arithmetic operations with bitwise operations. Existing implementations of BNNs have been focusing on GPU or FPGA, and using the conventional image-to-column method that doesn't perform well for binary convolution due to low arithmetic intensity and unfriendly pattern for bitwise operations. We propose BitFlow, a gemm-operator-network three-level optimization framework for fully exploiting the computing power of BNNs on CPU. BitFlow features a new class of algorithm named PressedConv for efficient binary convolution using locality-aware layout and vector parallelism. We evaluate BitFlow with the VGG network. On a single core of Intel Xeon Phi, BitFlow obtains 1.8x speedup over unoptimized BNN implementations, and 11.5x speedup over counterpart full-precision DNNs. Over 64 cores, BitFlow enables BNNs to run 1.1x faster than counterpart full-precision DNNs on GPU (GTX 1080).
Related Papers
- → Training Large Scale Deep Neural Networks on the Intel Xeon Phi Many-Core Coprocessor(2014)22 cited
- → Transmuting coprocessors(2009)13 cited
- → Symbiote Coprocessor Unit—A Streaming Coprocessor for Data Stream Acceleration(2015)2 cited
- Phi Coprocessor Acceleration Techniques for Computational Electromagnetics Methods(2014)
- → Design and Implement of High Performance Crypto Coprocessor(2014)