A Transformer-Based Model With Self-Distillation for Multimodal Emotion Recognition in Conversations
Citations Over TimeTop 1% of 2023 papers
Abstract
Emotion recognition in conversations (ERC), the task of recognizing the emotion of each utterance in a conversation, is crucial for building empathetic machines. Existing studies focus mainly on capturing context- and speaker-sensitive dependencies on the textual modality but ignore the significance of multimodal information. Different from emotion recognition in textual conversations, capturing intra- and inter-modal interactions between utterances, learning weights between different modalities, and enhancing modal representations play important roles in multimodal ERC. In this paper, we propose a transformer-based model with self-distillation (SDT) 1 1 The code is available at https://github.com/butterfliesss/SDT. for the task. The transformer-based model captures intra- and inter-modal interactions by utilizing intra- and inter-modal transformers, and learns weights between modalities dynamically by designing a hierarchical gated fusion strategy. Furthermore, to learn more expressive modal representations, we treat soft labels of the proposed model as extra training supervision. Specifically, we introduce self-distillation to transfer knowledge of hard and soft labels from the proposed model to each modality. Experiments on IEMOCAP and MELD datasets demonstrate that SDT outperforms previous state-of-the-art baselines.
Related Papers
- → Development of HMM Based Automatic Speech Recognition System for Indian English(2018)3 cited
- → Speech Recognition System and Isolated Word Recognition based on Hidden Markov Model (HMM) for Hearing Impaired(2013)14 cited
- → Text-dependent speaker identification using hidden Markov model with stress compensation technique(2002)11 cited
- → HMM-based Breath and Filled Pauses Elimination in ASR(2014)4 cited
- → Stressed speech recognition using multi-dimensional hidden Markov models(2002)1 cited