June 16, 2022 Natural Language Processing

A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation

  • 43 minutes
  • Linh The Nguyen*, Nguyen Luong Tran*, Long Doan*, Manh Luong, Dat Quoc Nguyen

  • InterSpeech 2022
Share

Abstract

In this paper, we introduce a high-quality and large-scale benchmark dataset for English-Vietnamese speech translation with 508 audio hours, consisting of 331K triplets of (sentence-lengthed audio, English source transcript sentence, Vietnamese target subtitle sentence). We also conduct empirical experiments using strong baselines and find that the traditional “Cascaded” approach still outperforms the modern “End-to-End” approach. To the best of our knowledge, this is the first large-scale English-Vietnamese speech translation study. We hope both our publicly available dataset and study can serve as a starting point for future research and applications on English-Vietnamese speech translation.

Bibtex

@inproceedings{nguyen22_interspeech,
  author={Linh The Nguyen and Nguyen Luong Tran and Long Doan and Manh Luong and Dat Quoc Nguyen},
  title={{A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={1726--1730},
  doi={10.21437/Interspeech.2022-218}
}
Back to Research
  • 43 minutes
  • Linh The Nguyen*, Nguyen Luong Tran*, Long Doan*, Manh Luong, Dat Quoc Nguyen

  • InterSpeech 2022
Share

Related publications