Regret Analysis for Continuous Dueling Bandit

Kumagai, Wataru

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 1711

Statistics > Machine Learning

Title: Regret Analysis for Continuous Dueling Bandit

Authors: Wataru Kumagai

(Submitted on 21 Nov 2017 (v1), last revised 12 Dec 2017 (this version, v2))

Abstract: The dueling bandit is a learning framework wherein the feedback information in the learning process is restricted to a noisy comparison between a pair of actions. In this research, we address a dueling bandit problem based on a cost function over a continuous space. We propose a stochastic mirror descent algorithm and show that the algorithm achieves an $O(\sqrt{T\log T})$-regret bound under strong convexity and smoothness assumptions for the cost function. Subsequently, we clarify the equivalence between regret minimization in dueling bandit and convex optimization for the cost function. Moreover, when considering a lower bound in convex optimization, our algorithm is shown to achieve the optimal convergence rate in convex optimization and the optimal regret in dueling bandit except for a logarithmic factor.

Comments:	14 pages. This paper was accepted at NIPS 2017 as a spotlight presentation
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:1711.07693 [stat.ML]
	(or arXiv:1711.07693v2 [stat.ML] for this version)

Submission history

From: Wataru Kumagai [view email]
[v1] Tue, 21 Nov 2017 09:58:00 GMT (36kb,D)
[v2] Tue, 12 Dec 2017 07:32:36 GMT (36kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:1711.07693

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Regret Analysis for Continuous Dueling Bandit

Submission history