Denoising based Sequence-to-Sequence Pre-training for Text Generation

Wang, Liang; Zhao, Wei; Jia, Ruoyu; Li, Sujian; Liu, Jingming

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 1908

Change to browse by:

Computer Science > Computation and Language

Title: Denoising based Sequence-to-Sequence Pre-training for Text Generation

Authors: Liang Wang, Wei Zhao, Ruoyu Jia, Sujian Li, Jingming Liu

(Submitted on 22 Aug 2019)

Abstract: This paper presents a new sequence-to-sequence (seq2seq) pre-training method PoDA (Pre-training of Denoising Autoencoders), which learns representations suitable for text generation tasks. Unlike encoder-only (e.g., BERT) or decoder-only (e.g., OpenAI GPT) pre-training approaches, PoDA jointly pre-trains both the encoder and decoder by denoising the noise-corrupted text, and it also has the advantage of keeping the network architecture unchanged in the subsequent fine-tuning stage. Meanwhile, we design a hybrid model of Transformer and pointer-generator networks as the backbone architecture for PoDA. We conduct experiments on two text generation tasks: abstractive summarization, and grammatical error correction. Results on four datasets show that PoDA can improve model performance over strong baselines without using any task-specific techniques and significantly speed up convergence.

Comments:	Accepted to EMNLP 2019
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1908.08206 [cs.CL]
	(or arXiv:1908.08206v1 [cs.CL] for this version)

Submission history

From: Liang Wang [view email]
[v1] Thu, 22 Aug 2019 05:26:25 GMT (88kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1908.08206

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Denoising based Sequence-to-Sequence Pre-training for Text Generation

Submission history