Improving Audio Classification Method by Combining Self-Supervision with Knowledge Distillation
Abstract
The current audio single-mode self-supervised classification mainly adopts a strategy based on audio spectrum reconstruction. Overall, its self-supervised approach is relatively single and cannot fully mine key semantic information in the time and frequency domains. In this regard, this article proposes a self-supervised method combined with knowledge distillation to further improve the performance of audio classification tasks. Firstly, considering the particularity of the two-dimensional audio spectrum, both self-supervised strategy construction is carried out in a single dimension in the time and frequency domains, and self-supervised construction is carried out in the joint dimension of time and frequency. Effectively learn audio spectrum details and key discriminative information through information reconstruction, comparative learning, and other methods. Secondly, in terms of feature self-supervision, two learning strategies for teacher-student models are constructed, which are internal to the model and based on knowledge distillation. Fitting the teacher’s model feature expression ability, further enhances the generalization of audio classification. Comparative experiments were conducted using the AudioSet dataset, ESC50 dataset, and VGGSound dataset. The results showed that the algorithm proposed in this paper has a 0.5% to 1.3% improvement in recognition accuracy compared to the optimal method based on audio single mode.
Related Papers
- Stable Discriminative Dictionary Learning Via Discriminative Deviation(2012)
- → Accurate And Fast Fine-Grained Image Classification via Discriminative Learning(2019)3 cited
- → Discriminative Regions: A Substrate for Analyzing Life-Logging Image Sequences(2014)2 cited
- → Diffusion-TTA: Test-time Adaptation of Discriminative Models via Generative Feedback(2023)