Puyuan Peng
The University of Texas at Austin(US)
Publications by Year
Research Areas
Speech Recognition and Synthesis, Natural Language Processing Techniques, Music and Audio Processing, Speech and Audio Processing, Multimodal Machine Learning Applications
Most-Cited Works
- → MAE-AST: Masked Autoencoding Audio Spectrogram Transformer(2022)76 cited
- → Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization(2023)35 cited
- → Word Discovery in Visually Grounded, Self-Supervised Speech Models(2022)34 cited
- → VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild(2024)28 cited
- → Fast-Slow Transformer for Visually Grounding Speech(2022)23 cited
- → Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling(2022)19 cited
- → AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models(2024)11 cited
- → Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model(2023)9 cited
- → A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings(2020)8 cited
- → Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos(2024)7 cited