September 20, 2022 Natural Language Processing

Disfluency Detection for Vietnamese

  • 21 minutes
  • Mai Hoang Dao, Thinh Hung Truong, Dat Quoc Nguyen

  • WNUT 2022
Share

Abstract

In this paper, we present the first empirical study for Vietnamese disfluency detection. To conduct this study, we first create a disfluency detection dataset for Vietnamese, with manual annotations over two disfluency types. We then empirically perform experiments using strong baseline models, and find that: automatic Vietnamese word segmentation improves the disfluency detection performances of the baselines, and the highest performance results are obtained by fine-tuning pre-trained language models in which the monolingual model PhoBERT for Vietnamese does better than the multilingual model XLM-R.

Bibtex

@inproceedings{PhoDisfluency,
title = {{Disfluency Detection for Vietnamese}},
author = {Mai Hoang Dao and Thinh Hung Truong and Dat Quoc Nguyen},
booktitle = {Proceedings of the 8th Workshop on Noisy User-generated Text (WNUT)},
year = {2022}
}

Back to Research
  • 21 minutes
  • Mai Hoang Dao, Thinh Hung Truong, Dat Quoc Nguyen

  • WNUT 2022
Share

Related publications