Coarse Grain Parallelization of H.264 Video Decoder and Memory Bottleneck in Multi-Core Architectures
Citations Over TimeTop 10% of 2011 papers
Abstract
Fine grain methods for parallelization of the H.264 decoder have good latency performance and less memory usage. However, they could not reach the scalability of coarse grain approaches although assuming a well-designed entropy decoder which can feed the increasing number of parallel working cores. We would like to introduce a GOP (Group of Pictures) level approach due to its high scalability, mentioning solution approaches for the well-known memory issues. Our design revokes the need to a scanner for GOP start-codes which was used in the earlier methods. This approach lets all the cores work on the decoding task. Our experiments showed that the memory initialization operations may degrade the scalability of parallel applications substantially. The multi-core cache architecture appeared to be a critical point for getting the desired speedup. We observed a speedup of 7.63 with 8 processors having separate caches, and a speedup of 13.35 using 16 processors when a cache is shared by 2 processors.
Related Papers
- → Accelerating BWA Aligner Using Multistage Data Parallelization on Multicore and Manycore Architectures(2016)8 cited
- Parallel massive mining of sequential patterns based on multi-core processors(2012)
- → Fusion: Abstractions for Multicore/Manycore Heterogenous Parallel Programming Using GPUs(2014)
- Software Optimization of H.264 Video Decoder(2008)
- Design and implementation of intra prediction in H.264 & AVS video decoder(2007)