Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability

Saha, Aadirupa; Krishnamurthy, Akshay

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2111

Change to browse by:

Computer Science > Machine Learning

Title: Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability

Authors: Aadirupa Saha, Akshay Krishnamurthy

(Submitted on 24 Nov 2021)

Abstract: We study the $K$-armed contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes \emph{preference-based feedback} suggesting that one decision was better than the other. We focus on the regret minimization problem under realizability, where the feedback is generated by a pairwise preference matrix that is well-specified by a given function class $\mathcal F$. We provide a new algorithm that achieves the optimal regret rate for a new notion of best response regret, which is a strictly stronger performance measure than those considered in prior works. The algorithm is also computationally efficient, running in polynomial time assuming access to an online oracle for square loss regression over $\mathcal F$. This resolves an open problem of Dud\'ik et al. [2015] on oracle efficient, regret-optimal algorithms for contextual dueling bandits.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2111.12306 [cs.LG]
	(or arXiv:2111.12306v1 [cs.LG] for this version)

Submission history

From: Aadirupa Saha [view email]
[v1] Wed, 24 Nov 2021 07:14:57 GMT (54kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2111.12306

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability

Submission history