Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits

Kim, Wonyoung; Oh, Min-whan; Paik, Myunghee Cho

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 2206

Statistics > Machine Learning

Title: Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits

Authors: Wonyoung Kim, Min-whan Oh, Myunghee Cho Paik

(Submitted on 11 Jun 2022 (this version), latest version 28 Mar 2023 (v3))

Abstract: We propose a novel algorithm for linear contextual bandits with $O(\sqrt{dT \log T})$ regret bound, where $d$ is the dimension of contexts and $T$ is the time horizon. Our proposed algorithm is equipped with a novel estimator in which exploration is embedded through explicit randomization. Depending on the randomization, our proposed estimator takes contribution either from contexts of all arms or from selected contexts. We establish a self-normalized bound for our estimator, which allows a novel decomposition of the cumulative regret into additive dimension-dependent terms instead of multiplicative terms. We also prove a novel lower bound of $\Omega(\sqrt{dT})$ under our problem setting. Hence, the regret of our proposed algorithm matches the lower bound up to logarithmic factors. The numerical experiments support the theoretical guarantees and show that our proposed method outperforms the existing linear bandit algorithms.

Comments:	27 pages including Appendix
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2206.05404 [stat.ML]
	(or arXiv:2206.05404v1 [stat.ML] for this version)

Submission history

From: Wonyoung Kim [view email]
[v1] Sat, 11 Jun 2022 02:43:17 GMT (503kb,D)
[v2] Thu, 16 Jun 2022 05:06:49 GMT (503kb,D)
[v3] Tue, 28 Mar 2023 23:17:31 GMT (5180kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:2206.05404v1

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits

Submission history