Development of text and speech corpus for an Indonesian speech-to-speech translation system
2017pp. 1–5
Citations Over Time
Abstract
This paper describes our natural language resources especially text and speech corpora for developing an Indonesian speech-to-speech translation (S2ST) system. The corpora are used to create models for Automatic Speech Recognition (ASR), Statistical Machine Translation (SMT), and Text-to-Speech (TTS) systems. The corpora collected since 1987 from various sources and projects such as Multilingual Machine Translation System (MMTS), PAN Localization, ASEAN MT, U-STAR, etc. Text corpora are created by either collecting from online resources or translating manually from textual sources. Speech corpora are made from several recording projects. Availability of these corpora enables us to develop Indonesian speech-to- speech translation system.
Related Papers
- → The Kyoto Speech-to-Speech Translation System for IWSLT 2023(2023)2 cited
- Ogmios: The UPC Text-to-Speech synthesis system for Spoken Translation(2006)
- → DARIJA-C: towards a Moroccan DARIJA Speech recognition and speech-to-text Translation Corpus(2023)5 cited
- → Implementation of telugu speech synthesis system(2017)4 cited
- → A System Design of English Speech Synthesis(2021)