References & Citations
Computer Science > Computation and Language
Title: PhoBERT: Pre-trained language models for Vietnamese
(Submitted on 2 Mar 2020 (v1), last revised 5 Oct 2020 (this version, v3))
Abstract: We present PhoBERT with two versions, PhoBERT-base and PhoBERT-large, the first public large-scale monolingual language models pre-trained for Vietnamese. Experimental results show that PhoBERT consistently outperforms the recent best pre-trained multilingual model XLM-R (Conneau et al., 2020) and improves the state-of-the-art in multiple Vietnamese-specific NLP tasks including Part-of-speech tagging, Dependency parsing, Named-entity recognition and Natural language inference. We release PhoBERT to facilitate future research and downstream applications for Vietnamese NLP. Our PhoBERT models are available at this https URL
Submission history
From: Dat Quoc Nguyen [view email][v1] Mon, 2 Mar 2020 10:21:17 GMT (16kb,D)
[v2] Thu, 30 Apr 2020 17:36:29 GMT (29kb,D)
[v3] Mon, 5 Oct 2020 09:53:19 GMT (30kb,D)
Link back to: arXiv, form interface, contact.