Bandit Structured Prediction for Neural Sequence-to-Sequence Learning

Kreutzer, Julia; Sokolov, Artem; Riezler, Stefan

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 1704

Statistics > Machine Learning

Title: Bandit Structured Prediction for Neural Sequence-to-Sequence Learning

Authors: Julia Kreutzer, Artem Sokolov, Stefan Riezler

(Submitted on 21 Apr 2017 (v1), last revised 13 Dec 2018 (this version, v2))

Abstract: Bandit structured prediction describes a stochastic optimization framework where learning is performed from partial feedback. This feedback is received in the form of a task loss evaluation to a predicted output structure, without having access to gold standard structures. We advance this framework by lifting linear bandit learning to neural sequence-to-sequence learning problems using attention-based recurrent neural networks. Furthermore, we show how to incorporate control variates into our learning algorithms for variance reduction and improved generalization. We present an evaluation on a neural machine translation task that shows improvements of up to 5.89 BLEU points for domain adaptation from simulated bandit feedback.

Comments:	ACL 2017
Subjects:	Machine Learning (stat.ML); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:1704.06497 [stat.ML]
	(or arXiv:1704.06497v2 [stat.ML] for this version)

Submission history

From: Julia Kreutzer [view email]
[v1] Fri, 21 Apr 2017 11:56:00 GMT (34kb,D)
[v2] Thu, 13 Dec 2018 17:00:18 GMT (34kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:1704.06497

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Bandit Structured Prediction for Neural Sequence-to-Sequence Learning

Submission history