Current browse context:
math.OC
Change to browse by:
References & Citations
Mathematics > Optimization and Control
Title: Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation
(Submitted on 31 May 2020 (v1), last revised 25 Sep 2020 (this version, v3))
Abstract: We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d^{2.5} \sqrt{n} \log(n))$, where $d$ is the dimension and $n$ is the number of interactions. This improves on $O(d^{9.5} \sqrt{n} \log(n)^{7.5}$ by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.
Submission history
From: Tor Lattimore [view email][v1] Sun, 31 May 2020 09:22:10 GMT (297kb,D)
[v2] Fri, 19 Jun 2020 13:04:49 GMT (397kb,D)
[v3] Fri, 25 Sep 2020 13:10:28 GMT (224kb,D)
Link back to: arXiv, form interface, contact.