Puppis: Hardware Accelerator of Single-Shot Multibox Detectors for Edge-Based Applications
Citations Over Time
Abstract
Object detection is a popular image-processing technique, widely used in numerous applications for detecting and locating objects in images or videos. While being one of the fastest algorithms for object detection, Single-shot Multibox Detection (SSD) networks are also computationally very demanding, which limits their usage in real-time edge applications. Even though the SSD post-processing algorithm is not the most-complex segment of the overall SSD object-detection network, it is still computationally demanding and can become a bottleneck with respect to processing latency and power consumption, especially in edge applications with limited resources. When using hardware accelerators to accelerate backbone CNN processing, the SSD post-processing step implemented in software can become the bottleneck for high-end applications where high frame rates are required, as this paper shows. To overcome this problem, we propose Puppis, an architecture for the hardware acceleration of the SSD post-processing algorithm. As the experiments showed, our solution led to an average SSD post-processing speedup of 33.34-times when compared with a software implementation. Furthermore, the execution of the complete SSD network was on average 36.45-times faster than the software implementation when the proposed Puppis SSD hardware accelerator was used together with some existing CNN accelerators.
Related Papers
- → Another view on parallel speedup(1990)102 cited
- → Toward a better parallel performance metric(1991)97 cited
- → Performance considerations of shared virtual memory machines(1995)24 cited
- → Speedup for Multi-Level Parallel Computing(2012)8 cited
- → Shared virtual memory and generalized speedup(2002)12 cited