Off-Policy Self-Critical Training for Transformer in Visual Paragraph Generation

Yan, Shiyang; Hua, Yang; Robertson, Neil M.

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2006

Computer Science > Computer Vision and Pattern Recognition

Title: Off-Policy Self-Critical Training for Transformer in Visual Paragraph Generation

Authors: Shiyang Yan, Yang Hua, Neil M. Robertson

(Submitted on 21 Jun 2020)

Abstract: Recently, several approaches have been proposed to solve language generation problems. Transformer is currently state-of-the-art seq-to-seq model in language generation. Reinforcement Learning (RL) is useful in solving exposure bias and the optimisation on non-differentiable metrics in seq-to-seq language learning. However, Transformer is hard to combine with RL as the costly computing resource is required for sampling. We tackle this problem by proposing an off-policy RL learning algorithm where a behaviour policy represented by GRUs performs the sampling. We reduce the high variance of importance sampling (IS) by applying the truncated relative importance sampling (TRIS) technique and Kullback-Leibler (KL)-control concept. TRIS is a simple yet effective technique, and there is a theoretical proof that KL-control helps to reduce the variance of IS. We formulate this off-policy RL based on self-critical sequence training. Specifically, we use a Transformer-based captioning model as the target policy and use an image-guided language auto-encoder as the behaviour policy to explore the environment. The proposed algorithm achieves state-of-the-art performance on the visual paragraph generation and improved results on image captioning.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2006.11714 [cs.CV]
	(or arXiv:2006.11714v1 [cs.CV] for this version)

Submission history

From: Shiyang Yan [view email]
[v1] Sun, 21 Jun 2020 05:10:17 GMT (1937kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2006.11714

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Off-Policy Self-Critical Training for Transformer in Visual Paragraph Generation

Submission history