We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: On Best-Arm Identification with a Fixed Budget in Non-Parametric Multi-Armed Bandits

Authors: Antoine Barrier (UMPA-ENSL, LMO, CELESTE), Aurélien Garivier (UMPA-ENSL, LIP), Gilles Stoltz (LMO, CELESTE)
Abstract: We lay the foundations of a non-parametric theory of best-arm identification in multi-armed bandits with a fixed budget T. We consider general, possibly non-parametric, models D for distributions over the arms; an overarching example is the model D = P(0,1) of all probability distributions over [0,1]. We propose upper bounds on the average log-probability of misidentifying the optimal arm based on information-theoretic quantities that correspond to infima over Kullback-Leibler divergences between some distributions in D and a given distribution. This is made possible by a refined analysis of the successive-rejects strategy of Audibert, Bubeck, and Munos (2010). We finally provide lower bounds on the same average log-probability, also in terms of the same new information-theoretic quantities; these lower bounds are larger when the (natural) assumptions on the considered strategies are stronger. All these new upper and lower bounds generalize existing bounds based, e.g., on gaps between distributions.
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)
Cite as: arXiv:2210.00895 [cs.LG]
  (or arXiv:2210.00895v1 [cs.LG] for this version)

Submission history

From: Gilles Stoltz [view email]
[v1] Fri, 30 Sep 2022 10:55:40 GMT (291kb,D)

Link back to: arXiv, form interface, contact.