Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

Oh, Min-hwan; Iyengar, Garud

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 2103

Statistics > Machine Learning

Title: Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

Authors: Min-hwan Oh, Garud Iyengar

(Submitted on 25 Mar 2021)

Abstract: We consider a sequential assortment selection problem where the user choice is given by a multinomial logit (MNL) choice model whose parameters are unknown. In each period, the learning agent observes a $d$-dimensional contextual information about the user and the $N$ available items, and offers an assortment of size $K$ to the user, and observes the bandit feedback of the item chosen from the assortment. We propose upper confidence bound based algorithms for this MNL contextual bandit. The first algorithm is a simple and practical method which achieves an $\tilde{\mathcal{O}}(d\sqrt{T})$ regret over $T$ rounds. Next, we propose a second algorithm which achieves a $\tilde{\mathcal{O}}(\sqrt{dT})$ regret. This matches the lower bound for the MNL bandit problem, up to logarithmic terms, and improves on the best known result by a $\sqrt{d}$ factor. To establish this sharper regret bound, we present a non-asymptotic confidence bound for the maximum likelihood estimator of the MNL model that may be of independent interest as its own theoretical contribution. We then revisit the simpler, significantly more practical, first algorithm and show that a simple variant of the algorithm achieves the optimal regret for a broad class of important applications.

Comments:	Accepted in AAAI 2021 (Main Technical Track)
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2103.13929 [stat.ML]
	(or arXiv:2103.13929v1 [stat.ML] for this version)

Submission history

From: Min-hwan Oh [view email]
[v1] Thu, 25 Mar 2021 15:42:25 GMT (840kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:2103.13929

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

Submission history