David Harwath
The University of Texas at Austin(US)
Publications by Year
Research Areas
Speech Recognition and Synthesis, Music and Audio Processing, Multimodal Machine Learning Applications, Natural Language Processing Techniques, Speech and Audio Processing
Most-Cited Works
- Unsupervised learning of spoken language with visual context(2016)
- → Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input(2018)171 cited
- → A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition(2013)135 cited
- → MAE-AST: Masked Autoencoding Audio Spectrogram Transformer(2022)76 cited
- → Learning Hierarchical Discrete Linguistic Units from Visually-Grounded\n Speech(2019)70 cited
- → Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos(2021)67 cited
- → Learning Word-Like Units from Joint Audio-Visual Analysis(2017)42 cited
- → Deep multimodal semantic embeddings for speech and images(2015)38 cited
- → Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization(2023)35 cited