*Thanh-Thien Le, *Viet Dao, *Linh Van Nguyen, Nhung Nguyen, Linh Ngo Van, Thien Huu Nguyen
In this paper, we present the first empirical study for Vietnamese disfluency detection. To conduct this study, we first create a disfluency detection dataset for Vietnamese, with manual annotations over two disfluency types. We then empirically perform experiments using strong baseline models, and find that: automatic Vietnamese word segmentation improves the disfluency detection performances of the baselines, and the highest performance results are obtained by fine-tuning pre-trained language models in which the monolingual model PhoBERT for Vietnamese does better than the multilingual model XLM-R.
Overall
< 1 minute
Mai Hoang Dao, Thinh Hung Truong, Dat Quoc Nguyen
WNUT 2022
Share Article