Conservative Contextual Linear Bandits

Kazerouni, Abbas; Ghavamzadeh, Mohammad; Abbasi-Yadkori, Yasin; Van Roy, Benjamin

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 1611

Statistics > Machine Learning

Title: Conservative Contextual Linear Bandits

Authors: Abbas Kazerouni, Mohammad Ghavamzadeh, Yasin Abbasi-Yadkori, Benjamin Van Roy

(Submitted on 19 Nov 2016 (v1), last revised 4 Mar 2017 (this version, v2))

Abstract: Safety is a desirable property that can immensely increase the applicability of learning algorithms in real-world decision-making problems. It is much easier for a company to deploy an algorithm that is safe, i.e., guaranteed to perform at least as well as a baseline. In this paper, we study the issue of safety in contextual linear bandits that have application in many different fields including personalized ad recommendation in online marketing. We formulate a notion of safety for this class of algorithms. We develop a safe contextual linear bandit algorithm, called conservative linear UCB (CLUCB), that simultaneously minimizes its regret and satisfies the safety constraint, i.e., maintains its performance above a fixed percentage of the performance of a baseline strategy, uniformly over time. We prove an upper-bound on the regret of CLUCB and show that it can be decomposed into two terms: 1) an upper-bound for the regret of the standard linear UCB algorithm that grows with the time horizon and 2) a constant (does not grow with the time horizon) term that accounts for the loss of being conservative in order to satisfy the safety constraint. We empirically show that our algorithm is safe and validate our theoretical analysis.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:1611.06426 [stat.ML]
	(or arXiv:1611.06426v2 [stat.ML] for this version)

Submission history

From: Abbas Kazerouni [view email]
[v1] Sat, 19 Nov 2016 20:36:30 GMT (31kb)
[v2] Sat, 4 Mar 2017 01:28:26 GMT (180kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:1611.06426

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Conservative Contextual Linear Bandits

Submission history