Low Bit-rate Speech Coding with VQ-VAE and a WaveNet Decoder
Citations Over TimeTop 10% of 2019 papers
Abstract
In order to efficiently transmit and store speech signals, speech codecs create a minimally redundant representation of the input signal which is then decoded at the receiver with the best possible perceptual quality. In this work we demonstrate that a neural network architecture based on VQ-VAE with a WaveNet decoder can be used to perform very low bit-rate speech coding with high reconstruction quality. A prosody-transparent and speaker-independent model trained on the LibriSpeech corpus coding audio at 1.6 kbps exhibits perceptual quality which is around halfway between the MELP codec at 2.4 kbps and AMR-WB codec at 23.05 kbps. In addition, when training on high-quality recorded speech with the test speaker included in the training set, a model coding speech at 1.6 kbps produces output of similar perceptual quality to that generated by AMR-WB at 23.05 kbps.
Related Papers
- → An adaptive multi-rate speech codec based on MP-CELP coding algorithm for ETSI AMR standard(2002)8 cited
- → Implementation of a 12.8 kbit/s LD-CELP speech codec(2002)
- Adaptive window excitation coding in low-bit-rate CELP coders(2000)
- → Celp With Priority To Critical Segments(1998)
- A Computationally Efficient CELP Codec with Stochastic Vectorquantisation of LPC Parameters(1989)