a public good project by the
Synthesis
Company
of California

© 2026

Ziyang Ma | doi.page

0 works0 citations0 h-index

Ziyang Ma

Kunming University of Science and Technology(CN)University of Science and Technology of China(CN)Sun Yat-sen University(CN)Shanghai Jiao Tong University(CN)China University of Geosciences(CN)Suzhou Institute of Biomedical Engineering and Technology(CN)Xijing Hospital(CN)

Publications by Year

Research Areas

Speech Recognition and Synthesis, Music and Audio Processing, Natural Language Processing Techniques, Speech and Audio Processing, Speech and dialogue systems

Most-Cited Works

→ emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation(2024)89 cited
→ Leveraging Speech PTM, Text LLM, And Emotional TTS For Speech Emotion Recognition(2024)27 cited
→ F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching(2025)24 cited
→ EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark(2024)22 cited
→ MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition(2024)21 cited
→ VoiceFlow: Efficient Text-To-Speech with Rectified Flow Matching(2024)20 cited

→ LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT(2023)

18 cited

→ Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS(2024)16 cited

→ LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR(2024)16 cited

→ MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets(2023)12 cited