June 16, 2022 Natural Language Processing
A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation
43 minutes
Linh The Nguyen*, Nguyen Luong Tran*, Long Doan*, Manh Luong, Dat Quoc Nguyen
InterSpeech 2022
Abstract
In this paper, we introduce a high-quality and large-scale benchmark dataset for English-Vietnamese speech translation with 508 audio hours, consisting of 331K triplets of (sentence-lengthed audio, English source transcript sentence, Vietnamese target subtitle sentence). We also conduct empirical experiments using strong baselines and find that the traditional “Cascaded” approach still outperforms the modern “End-to-End” approach. To the best of our knowledge, this is the first large-scale English-Vietnamese speech translation study. We hope both our publicly available dataset and study can serve as a starting point for future research and applications on English-Vietnamese speech translation.
Bibtex
@inproceedings{nguyen22_interspeech, author={Linh The Nguyen and Nguyen Luong Tran and Long Doan and Manh Luong and Dat Quoc Nguyen}, title={{A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation}}, year=2022, booktitle={Proc. Interspeech 2022}, pages={1726--1730}, doi={10.21437/Interspeech.2022-218} }
43 minutes
Linh The Nguyen*, Nguyen Luong Tran*, Long Doan*, Manh Luong, Dat Quoc Nguyen
InterSpeech 2022