Gradient Descent for Noisy Optimization

Hu, Annie; Gerber, Mathieu

Full-text links:

Download:

Current browse context:

math.OC

< prev | next >

new | recent | 2405

Mathematics > Optimization and Control

Title: Gradient Descent for Noisy Optimization

Authors: Annie Hu, Mathieu Gerber

(Submitted on 10 May 2024 (v1), last revised 5 Jun 2024 (this version, v3))

Abstract: We study the use of gradient descent with backtracking line search (GD-BLS) to solve the noisy optimization problem $\theta_\star:=\mathrm{argmin}_{\theta\in\mathbb{R}^d} \mathbb{E}[f(\theta,Z)]$, imposing that the function $F(\theta):=\mathbb{E}[f(\theta,Z)]$ is strictly convex but not necessarily $L$-smooth. Assuming that $\mathbb{E}[\|\nabla_\theta f(\theta_\star,Z)\|^2]<\infty$, we first prove that sample average approximation based on GD-BLS allows to estimate $\theta_\star$ with an error of size $\mathcal{O}_{\mathbb{P}}(B^{-0.25})$, where $B$ is the available computational budget. We then show that we can improve upon this rate by stopping the optimization process earlier when the gradient of the objective function is sufficiently close to zero, and use the residual computational budget to optimize, again with GD-BLS, a finer approximation of $F$. By iteratively applying this strategy $J$ times, we establish that we can estimate $\theta_\star$ with an error of size $\mathcal{O}_{\mathbb{P}}(B^{-\frac{1}{2}(1-\delta^{J})})$, where $\delta\in(1/2,1)$ is a user-specified parameter. More generally, we show that if $\mathbb{E}[\|\nabla_\theta f(\theta_\star,Z)\|^{1+\alpha}]<\infty$ for some known $\alpha\in (0,1]$ then this approach, which can be seen as a retrospective approximation algorithm with a fixed computational budget, allows to learn $\theta_\star$ with an error of size $\mathcal{O}_{\mathbb{P}}(B^{-\frac{\alpha}{1+\alpha}(1-\delta^{J})})$, where $\delta\in(2\alpha/(1+3\alpha),1)$ is a tuning parameter. Beyond knowing $\alpha$, achieving the aforementioned convergence rates do not require to tune the algorithms parameters according to the specific functions $F$ and $f$ at hand, and we exhibit a simple noisy optimization problem for which stochastic gradient is not guaranteed to converge while the algorithms discussed in this work are.

Comments:	40 pages, 3 figures
Subjects:	Optimization and Control (math.OC)
Cite as:	arXiv:2405.06539 [math.OC]
	(or arXiv:2405.06539v3 [math.OC] for this version)

Submission history

From: Mathieu Gerber [view email]
[v1] Fri, 10 May 2024 15:33:41 GMT (68kb,D)
[v2] Thu, 30 May 2024 15:24:32 GMT (67kb,D)
[v3] Wed, 5 Jun 2024 13:49:15 GMT (67kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> math > arXiv:2405.06539

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Mathematics > Optimization and Control

Title: Gradient Descent for Noisy Optimization

Submission history