Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

Zimmert, Julian; Lattimore, Tor

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 1905

Computer Science > Machine Learning

Title: Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

Authors: Julian Zimmert, Tor Lattimore

(Submitted on 28 May 2019)

Abstract: The information-theoretic analysis by Russo and Van Roy (2014) in combination with minimax duality has proved a powerful tool for the analysis of online learning algorithms in full and partial information settings. In most applications there is a tantalising similarity to the classical analysis based on mirror descent. We make a formal connection, showing that the information-theoretic bounds in most applications can be derived from existing techniques for online convex optimisation. Besides this, for $k$-armed adversarial bandits we provide an efficient algorithm with regret that matches the best information-theoretic upper bound and improve best known regret guarantees for online linear optimisation on $\ell_p$-balls and bandits with graph feedback.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1905.11817 [cs.LG]
	(or arXiv:1905.11817v1 [cs.LG] for this version)

Submission history

From: Julian Zimmert [view email]
[v1] Tue, 28 May 2019 13:53:30 GMT (40kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1905.11817

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

Submission history