We gratefully acknowledge support from
the Simons Foundation and member institutions.

Optimization and Control

New submissions

[ total of 46 entries: 1-46 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 18 Feb 20

[1]  arXiv:2002.06281 [pdf, ps, other]
Title: A Robust Traffic Control Model Considering Uncertainties in Turning Ratios
Subjects: Optimization and Control (math.OC)

The effects of uncertainties in model parameters on traffic flow control have recently drawn much research attention. Although certain parameters, such as capacity, initial densities, have been studied, the uncertainties in turning ratios have received few efforts. To fill this gap, this paper proposed a robust control model to deal with the uncertainties in the turning ratio by using distributionally robust chance constraints. The model offers an optimal solution over all possible distributions in accordance with given prior knowledge. Then, we apply this robust model on both a highway network and an urban network, and study the interactions between the uncertainties and the control inputs of the entire network.

[2]  arXiv:2002.06309 [pdf, other]
Title: Stochastic optimization over proximally smooth sets
Subjects: Optimization and Control (math.OC)

We introduce a class of stochastic algorithms for minimizing weakly convex functions over proximally smooth sets. As their main building blocks, the algorithms use simplified models of the objective function and the constraint set, along with a retraction operation to restore feasibility. All the proposed methods come equipped with a finite time efficiency guarantee in terms of a natural stationarity measure. We discuss consequences for nonsmooth optimization over smooth manifolds and over sets cut out by weakly-convex inequalities.

[3]  arXiv:2002.06315 [pdf, other]
Title: Bregman Augmented Lagrangian and Its Acceleration
Authors: Shen Yan, Niao He
Comments: 25 pages, 2 figures
Subjects: Optimization and Control (math.OC)

We study the Bregman Augmented Lagrangian method (BALM) for solving convex problems with linear constraints. For classical Augmented Lagrangian method, the convergence rate and its relation with the proximal point method is well-understood. However, the convergence rate for BALM has not yet been thoroughly studied in the literature. In this paper, we analyze the convergence rates of BALM in terms of the primal objective as well as the feasibility violation. We also develop, for the first time, an accelerated Bregman proximal point method, that improves the convergence rate from $O(1/\sum_{k=0}^{T-1}\eta_k)$ to $O(1/(\sum_{k=0}^{T-1}\sqrt{\eta_k})^2)$, where $\{\eta_k\}_{k=0}^{T-1}$ is the sequence of proximal parameters. When applied to the dual of linearly constrained convex programs, this leads to the construction of an accelerated BALM, that achieves the improved rates for both primal and dual convergences.

[4]  arXiv:2002.06632 [pdf, ps, other]
Title: Passive Linear Discrete-Time Systems -- Characterization through Structure
Authors: Izchak Lewkowicz
Subjects: Optimization and Control (math.OC); Functional Analysis (math.FA)

We here show that discrete-time passive linear systems are intimately linked to the structure of maximal, matrix-convex sets, closed under multiplication among their elements. Moreover, this observation unifies three setups: (i) difference inclusions, (ii) matrix-valued rational functions, (iii) realization arrays associated with rational functions. It turns out that in the continuous-time case, the associated structure is if of maximal matrix-convex, cones, closed under inversion.

[5]  arXiv:2002.06649 [pdf, ps, other]
Title: Frequency regulation with thermostatically controlled loads: aggregation of dynamics and synchronization
Comments: 10 pages, 4 figures, 1 table
Subjects: Optimization and Control (math.OC)

Thermostatically controlled loads (TCLs) can provide ancillary services to the power network by aiding existing frequency control mechanisms. These loads are, however, characterized by an intrinsic limit cycle behavior which raises the risk that these could synchronize when coupled with the frequency dynamics of the power grid, i.e. simultaneously switch, inducing persistent and possibly catastrophic power oscillations. Control schemes with randomization in the control policy have been proposed in the literature to address this problem. However, such stochastic schemes introduce delays in the response of TCLs that may limit their ability to provide support at urgencies. In this paper, we present a deterministic control mechanism for TCLs such that those switch when prescribed frequency thresholds are exceeded in order to provide ancillary services to the power network. For the considered scheme, we propose appropriate conditions for the design of the frequency thresholds that bound the coupling between frequency and TCL dynamics, so as to avoid synchronization. In particular, we show that as the number of loads tends to infinity, there exist arbitrarily long time intervals where the frequency deviations are arbitrarily small. Our analytical results are verified with simulations on the Northeast Power Coordinating Council (NPCC) 140-bus system, which demonstrate that the proposed scheme offers significantly improved frequency response in comparison with conventional implementations and existing stochastic schemes.

[6]  arXiv:2002.06731 [pdf, other]
Title: A two-stage algorithm for aircraft conflict resolution with trajectory recovery
Subjects: Optimization and Control (math.OC); Discrete Mathematics (cs.DM)

As air traffic volume is continuously increasing, it has become a priority to improve traffic control algorithms to handle future air travel demand and improve airspace capacity. We address the conflict resolution problem in air traffic control using a novel approach for aircraft collision avoidance with trajectory recovery. We present a two-stage algorithm that first solves all initial conflicts by adjusting aircraft headings and speeds, before identifying the optimal time for aircraft to recover towards their target destination. The collision avoidance stage extends an existing mixed-integer programming formulation to heading control. For the trajectory recovery stage, we introduce a novel exact mixed-integer programming formulation as well as a greedy heuristic algorithm. The proposed two-stage approach guarantees that all trajectories during both the collision avoidance and recovery stages are conflict-free. Numerical results on benchmark problems show that the proposed heuristic for trajectory recovery is competitive while also emphasizing the difficulty of this optimization problem. The proposed approach can be used as a decision-support tool for introducing automation in air traffic control.

[7]  arXiv:2002.06793 [pdf]
Title: Optimal BESS Allocation in Large Transmission Networks Using Linearized BESS Models
Comments: Accepted for presentation, and will be published in the Proceedings of the 2020 IEEE PES General Meeting, August 2-6 2020, Montreal, Quebec, Canada
Subjects: Optimization and Control (math.OC)

The most commonly used model for battery energy storage systems (BESSs) in optimal BESS allocation problems is a constant-efficiency model. However, the charging and discharging efficiencies of BESSs vary non-linearly as functions of their state-of-charge, temperature, charging/discharging powers, as well as the BESS technology being considered. Therefore, constant-efficiency models may inaccurately represent the non-linear operating characteristics of the BESS. In this paper, we first create technology-specific linearized BESS models derived from the actual non-linear BESS models. We then incorporate the linearized BESS models into a mixed-integer linear programming framework for optimal multi-technology BESS allocation. Studies carried out on a 2,604-bus U.S. transmission network demonstrate the benefits of utilizing the linearized BESS models from the model accuracy, convexity, and computational performance viewpoints.

[8]  arXiv:2002.06822 [pdf, ps, other]
Title: Lyapunov characterization of uniform exponential stability for nonlinear infinite-dimensional systems
Authors: Ihab Haidar (Quartz), Yacine Chitour (L2S), Paolo Mason (L2S, CNRS), Mario Sigalotti (Inria, CaGE, LJLL (UMR\_7598))
Subjects: Optimization and Control (math.OC)

In this paper we deal with infinite-dimensional nonlinear forward complete dynamical systems which are subject to external disturbances. We first extend the well-known Datko lemma to the framework of the considered class of systems. Thanks to this generalization, we provide characterizations of the uniform (with respect to disturbances) local, semi-global, and global exponential stability, through the existence of coercive and non-coercive Lyapunov functionals. The importance of the obtained results is underlined through some applications concerning 1) exponential stability of nonlinear retarded systems with piecewise constant delays, 2) exponential stability preservation under sampling for semilinear control switching systems, and 3) the link between input-to-state stability and exponential stability of semilinear switching systems.

[9]  arXiv:2002.06848 [pdf, other]
Title: SingCubic: Cyclic Incremental Newton-type Gradient Descent with Cubic Regularization for Non-Convex Optimization
Authors: Ziqiang Shi
Subjects: Optimization and Control (math.OC)

In this work, we generalized and unified two recent completely different works of~\cite{shi2015large} and~\cite{cartis2012adaptive} respectively into one by proposing the cyclic incremental Newton-type gradient descent with cubic regularization (SingCubic) method for optimizing non-convex functions. Through the iterations of SingCubic, a cubic regularized global quadratic approximation using Hessian information is kept and solved. Preliminary numerical experiments show the encouraging performance of the SingCubic algorithm when compared to basic incremental or stochastic Newton-type implementations. The results and technique can be served as an initiate for the research on the incremental Newton-type gradient descent methods that employ cubic regularization. The methods and principles proposed in this paper can be used to do logistic regression, autoencoder training, independent components analysis, Ising model/Hopfield network training, multilayer perceptron, deep convolutional network training and so on. We will open-source parts of our implementations soon.

[10]  arXiv:2002.06952 [pdf, ps, other]
Title: Closed-loop Equilibrium for Time-Inconsistent McKean-Vlasov Controlled Problem
Authors: Hongwei Mei, Chao Zhu
Subjects: Optimization and Control (math.OC)

The paper deals with a class of time-inconsistent control problems for McKean-Vlasov dynamics. By solving a backward time-inconsistent Hamilton-Jacobi-Bellman (HJB for short) equation coupled with a forward distribution-dependent stochastic differential equation, we investigate the existence and uniqueness of a closed-loop equilibrium for such time-inconsistent distribution-dependent control problem. Moreover, a special case of semi-linear McKean-Vlasov dynamics with a quadratic-type cost functional is considered due to its special structure.

[11]  arXiv:2002.07003 [pdf, other]
Title: A Newton Frank-Wolfe Method for Constrained Self-Concordant Minimization
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)

We demonstrate how to scalably solve a class of constrained self-concordant minimization problems using linear minimization oracles (LMO) over the constraint set. We prove that the number of LMO calls of our method is nearly the same as that of the Frank-Wolfe method in the L-smooth case. Specifically, our Newton Frank-Wolfe method uses $\mathcal{O}(\epsilon^{-\nu})$ LMO's, where $\epsilon$ is the desired accuracy and $\nu:= 1 + o(1)$. In addition, we demonstrate how our algorithm can exploit the improved variants of the LMO-based schemes, including away-steps, to attain linear convergence rates. We also provide numerical evidence with portfolio design with the competitive ratio, D-optimal experimental design, and logistic regression with the elastic net where Newton Frank-Wolfe outperforms the state-of-the-art.

[12]  arXiv:2002.07021 [pdf, ps, other]
Title: An Efficient Robust Approach to the Day-ahead Operation of an Aggregator of Electric Vehicles
Comments: 8 pages, 4 figures
Subjects: Optimization and Control (math.OC)

The growing use of electric vehicles (EVs) may hinder their integration into the electricity system as well as their efficient operation due to the intrinsic stochasticity associated with their driving patterns. In this work, we assume a profit-maximizer EV-aggregator who participates in the day-ahead electricity market. The aggregator accounts for the technical aspects of each individual EV and the uncertainty in its driving patterns. We propose a hierarchical optimization approach to represent the decision-making of this aggregator. The upper level models the profit-maximizer aggregator's decisions on the EV-fleet operation, while a series of lower-level problems computes the worst-case EV availability profiles in terms of battery draining and energy exchange with the market. Then, this problem can be equivalently transformed into a mixed-integer linear single-level equivalent given the totally unimodular character of the constraint matrices of the lower-level problems and their convexity. Finally, we thoroughly analyze the benefits of the hierarchical model compared to the results from stochastic and deterministic models.

Cross-lists for Tue, 18 Feb 20

[13]  arXiv:2002.06273 (cross-list from math.AP) [pdf, ps, other]
Title: Collapsing and the convex hull property in a soap film capillarity model
Comments: 13 pages, 3 figures
Subjects: Analysis of PDEs (math.AP); Mathematical Physics (math-ph); Differential Geometry (math.DG); Optimization and Control (math.OC)

Soap films hanging from a wire frame are studied in the framework of capillarity theory. Minimizers in the corresponding variational problem are known to consist of positive volume regions with boundaries of constant mean curvature/pressure, possibly connected by "collapsed" minimal surfaces. We prove here that collapsing only occurs if the mean curvature/pressure of the bulky regions is negative, and that, when this last property holds, the whole soap film lies in the convex hull of its boundary wire frame.

[14]  arXiv:2002.06277 (cross-list from cs.LG) [pdf, other]
Title: A mean-field analysis of two-player zero-sum games
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Probability (math.PR); Machine Learning (stat.ML)

Finding Nash equilibria in two-player zero-sum continuous games is a central problem in machine learning, e.g. for training both GANs and robust models. The existence of pure Nash equilibria requires strong conditions which are not typically met in practice. Mixed Nash equilibria exist in greater generality and may be found using mirror descent. Yet this approach does not scale to high dimensions. To address this limitation, we parametrize mixed strategies as mixtures of particles, whose positions and weights are updated using gradient descent-ascent. We study this dynamics as an interacting gradient flow over measure spaces endowed with the Wasserstein-Fisher-Rao metric. We establish global convergence to an approximate equilibrium for the related Langevin gradient-ascent dynamic. We prove a law of large numbers that relates particle dynamics to mean-field dynamics. Our method identifies mixed equilibria in high dimensions and is demonstrably effective for training mixtures of GANs.

[15]  arXiv:2002.06286 (cross-list from cs.LG) [pdf, ps, other]
Title: Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

Despite the wide applications of Adam in reinforcement learning (RL), the theoretical convergence of Adam-type RL algorithms has not been established. This paper provides the first such convergence analysis for two fundamental RL algorithms of policy gradient (PG) and temporal difference (TD) learning that incorporate AMSGrad updates (a standard alternative of Adam in theoretical analysis), referred to as PG-AMSGrad and TD-AMSGrad, respectively. Moreover, our analysis focuses on Markovian sampling for both algorithms. We show that under general nonlinear function approximation, PG-AMSGrad with a constant stepsize converges to a neighborhood of a stationary point at the rate of $\mathcal{O}(1/T)$ (where $T$ denotes the number of iterations), and with a diminishing stepsize converges exactly to a stationary point at the rate of $\mathcal{O}(\log^2 T/\sqrt{T})$. Furthermore, under linear function approximation, TD-AMSGrad with a constant stepsize converges to a neighborhood of the global optimum at the rate of $\mathcal{O}(1/T)$, and with a diminishing stepsize converges exactly to the global optimum at the rate of $\mathcal{O}(\log T/\sqrt{T})$. Our study develops new techniques for analyzing the Adam-type RL algorithms under Markovian sampling.

[16]  arXiv:2002.06474 (cross-list from cs.NI) [pdf, other]
Title: Is Deadline Oblivious Scheduling Efficient for Controlling Real-Time Traffic in Cellular Downlink Systems?
Subjects: Networking and Internet Architecture (cs.NI); Optimization and Control (math.OC)

The emergence of bandwidth-intensive latency-critical traffic in 5G Networks, such as Virtual Reality, has motivated interest in wireless resource allocation problems for flows with hard-deadlines. Attempting to solve this problem brings about two challenges: (i) The flow arrival and the channel state are not known to the Base Station (BS) apriori, thus, the allocation decisions need to be made online. (ii) Wireless resource allocation algorithms that attempt to maximize a reward will likely be unfair, causing unacceptable service for some users. We model the problem as an online convex optimization problem. We propose a primal-dual Deadline-Oblivious (DO) algorithm, and show it is approximately 3.6-competitive. Furthermore, we show via simulations that our algorithm tracks the prescient offline solution very closely, significantly outperforming several existing algorithms. In the second part, we impose a stochastic constraint on the allocation, requiring a guarantee that each user achieves a certain timely throughput (amount of traffic delivered within the deadline over a period of time). We propose the Long-term Fair Deadline Oblivious (LFDO) algorithm for that setup. We combine the Lyapunov framework with analysis of online algorithms, to show that LFDO retains the high-performance of DO, while satisfying the long-term stochastic constraints.

[17]  arXiv:2002.06694 (cross-list from stat.ML) [pdf, other]
Title: Structures of Spurious Local Minima in $k$-means
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)

$k$-means clustering is a fundamental problem in unsupervised learning. The problem concerns finding a partition of the data points into $k$ clusters such that the within-cluster variation is minimized. Despite its importance and wide applicability, a theoretical understanding of the $k$-means problem has not been completely satisfactory. Existing algorithms with theoretical performance guarantees often rely on sophisticated (sometimes artificial) algorithmic techniques and restricted assumptions on the data. The main challenge lies in the non-convex nature of the problem; in particular, there exist additional local solutions other than the global optimum. Moreover, the simplest and most popular algorithm for $k$-means, namely Lloyd's algorithm, generally converges to such spurious local solutions both in theory and in practice.
In this paper, we approach the $k$-means problem from a new perspective, by investigating the structures of these spurious local solutions under a probabilistic generative model with $k$ ground truth clusters. As soon as $k=3$, spurious local minima provably exist, even for well-separated and balanced clusters. One such local minimum puts two centers at one true cluster, and the third center in the middle of the other two true clusters. For general $k$, one local minimum puts multiple centers at a true cluster, and one center in the middle of multiple true clusters. Perhaps surprisingly, we prove that this is essentially the only type of spurious local minima under a separation condition. Our results pertain to the $k$-means formulation for mixtures of Gaussians or bounded distributions. Our theoretical results corroborate existing empirical observations and provide justification for several improved algorithms for $k$-means clustering.

[18]  arXiv:2002.06777 (cross-list from stat.CO) [pdf, other]
Title: Fitting ARMA Time Series Models without Identification: A Proximal Approach
Subjects: Computation (stat.CO); Optimization and Control (math.OC)

Fitting autoregressive moving average (ARMA) time series models requires model identification before parameter estimation. Model identification involves determining the order for the autoregressive and moving average components which is generally performed by visual inspection of the autocorrelation and partial autocorrelation functions, or by other offline methods. In many of today's big data regime applications of time series models, however, there is a need to model one or multiple streams of data in an iterative fashion. Hence, the offline model identification step is significantly prohibitive. In this work, we regularize the objective of the optimization behind the ARMA parameter estimation problem with a nonsmooth hierarchical sparsity inducing penalty based on two path graphs that allows incorporating the identification into the estimation step. A proximal block coordinate descent algorithm is then proposed to solve the underlying optimization problem. The resulting model satisfies the required stationarity and invertibility conditions for ARMA models. Numerical results supporting the proposed method are presented.

[19]  arXiv:2002.06849 (cross-list from physics.soc-ph) [pdf, other]
Title: Gerrymandering and fair districting in parallel voting systems
Subjects: Physics and Society (physics.soc-ph); Optimization and Control (math.OC)

Switching from one electoral system to another one is frequently criticized by the opposition and is viewed as a means for the ruling party to stay in power. In particular, when the new electoral system is a parallel voting (or a single-member district) system, the ruling party is usually suspected of a biased way of partitioning the state into electoral districts such that based on a priori knowledge it has more chances to win in a maximum possible number of districts. In this paper, we propose a new methodology for deciding whether a particular party benefits from a given districting map under a parallel voting system. As a part of our methodology, we formulate and solve several gerrymandering problems. We showcased the application of our approach to the Moldovan parliamentary elections of 2019. Our results suggest that contrary to the arguments of previous studies, there is no clear evidence to consider that the districting map used in those elections was unfair.

[20]  arXiv:2002.06874 (cross-list from cs.RO) [pdf, other]
Title: On sensing-aware model predictive path-following control for a reversing general 2-trailer with a car-like tractor
Comments: IEEE International Conference on Robotics and Automation (ICRA), 2020
Subjects: Robotics (cs.RO); Optimization and Control (math.OC)

The design of reliable path-following controllers is a key ingredient for successful deployment of self-driving vehicles. This controller-design problem is especially challenging for a general 2-trailer with a car-like tractor due to the vehicle's structurally unstable joint-angle kinematics in backward motion and the car-like tractor's curvature limitations which can cause the vehicle segments to fold and enter a jackknife state. Furthermore, optical sensors with a limited field of view have been proposed to solve the joint-angle estimation problem online, which introduce additional restrictions on which vehicle states that can be reliably estimated. To incorporate these restrictions at the level of control, a model predictive path-following controller is proposed. By taking the vehicle's physical and sensing limitations into account, it is shown in real-world experiments that the performance of the proposed path-following controller in terms of suppressing disturbances and recovering from non-trivial initial states is significantly improved compared to a previously proposed solution where the constraints have been neglected.

[21]  arXiv:2002.07052 (cross-list from math.NA) [pdf, ps, other]
Title: Nearest $Ω$-stable matrix via Riemannian optimization
Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)

We study the problem of finding the nearest $\Omega$-stable matrix to a certain matrix $A$, i.e., the nearest matrix with all its eigenvalues in a prescribed closed set $\Omega$. Distances are measured in the Frobenius norm. An important special case is finding the nearest Hurwitz or Schur stable matrix, which has applications in systems theory. We describe a reformulation of the task as an optimization problem on the Riemannian manifold of orthogonal (or unitary) matrices. The problem can then be solved using standard methods from the theory of Riemannian optimization. The resulting algorithm is remarkably fast on small-scale and medium-scale matrices, and returns directly a Schur factorization of the minimizer, sidestepping the numerical difficulties associated with eigenvalues with high multiplicity.

[22]  arXiv:2002.07066 (cross-list from cs.LG) [pdf, ps, other]
Title: Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA); Optimization and Control (math.OC); Machine Learning (stat.ML)

We develop provably efficient reinforcement learning algorithms for two-player zero-sum Markov games in which the two players simultaneously take actions. To incorporate function approximation, we consider a family of Markov games where the reward function and transition kernel possess a linear structure. Both the offline and online settings of the problems are considered. In the offline setting, we control both players and the goal is to find the Nash Equilibrium efficiently by minimizing the worst-case duality gap. In the online setting, we control a single player to play against an arbitrary opponent and the goal is to minimize the regret. For both settings, we propose an optimistic variant of the least-squares minimax value iteration algorithm. We show that our algorithm is computationally efficient and provably achieves an $\tilde O(\sqrt{d^3 H^3 T})$ upper bound on the duality gap and regret, without requiring additional assumptions on the sampling model.
We highlight that our setting requires overcoming several new challenges that are absent in Markov decision processes or turn-based Markov games. In particular, to achieve optimism in simultaneous-move Marko games, we construct both upper and lower confidence bounds of the value function, and then compute the optimistic policy by solving a general-sum matrix game with these bounds as the payoff matrices. As finding the Nash Equilibrium of such a general-sum game is computationally hard, our algorithm instead solves for a Coarse Correlated Equilibrium (CCE), which can be obtained efficiently via linear programming. To our best knowledge, such a CCE-based scheme for implementing optimism has not appeared in the literature and might be of interest in its own right.

[23]  arXiv:2002.07085 (cross-list from math.DS) [pdf, ps, other]
Title: Small-gain theorem for stability, cooperative control and distributed observation of infinite networks
Comments: arXiv admin note: text overlap with arXiv:1910.12746
Subjects: Dynamical Systems (math.DS); Optimization and Control (math.OC)

Motivated by a paradigm shift towards a hyper-connected world, we develop a computationally tractable small-gain theorem for a network of infinitely many systems, termed as infinite networks. The proposed small-gain theorem addresses exponential input-to-state stability with respect to closed sets, which enables us to analyze diverse stability problems in a unified manner. The small-gain condition, expressed in terms of the spectral radius of a gain operator collecting all the information about the internal Lyapunov gains, can be numerically computed for a large class of systems in an efficient way. To demonstrate broad applicability of our small-gain theorem, we apply it to the stability analysis of infinite time-varying networks, to consensus in infinite-agent systems, as well as to the design of distributed observers for infinite networks.

[24]  arXiv:2002.07125 (cross-list from cs.LG) [pdf, ps, other]
Title: Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)

The current paper studies the problem of agnostic $Q$-learning with function approximation in deterministic systems where the optimal $Q$-function is approximable by a function in the class $\mathcal{F}$ with approximation error $\delta \ge 0$. We propose a novel recursion-based algorithm and show that if $\delta = O\left(\rho/\sqrt{\dim_E}\right)$, then one can find the optimal policy using $O\left(\dim_E\right)$ trajectories, where $\rho$ is the gap between the optimal $Q$-value of the best actions and that of the second-best actions and $\dim_E$ is the Eluder dimension of $\mathcal{F}$. Our result has two implications:
1) In conjunction with the lower bound in [Du et al., ICLR 2020], our upper bound suggests that the condition $\delta = \widetilde{\Theta}\left(\rho/\sqrt{\mathrm{dim}_E}\right)$ is necessary and sufficient for algorithms with polynomial sample complexity.
2) In conjunction with the lower bound in [Wen and Van Roy, NIPS 2013], our upper bound suggests that the sample complexity $\widetilde{\Theta}\left(\mathrm{dim}_E\right)$ is tight even in the agnostic setting.
Therefore, we settle the open problem on agnostic $Q$-learning proposed in [Wen and Van Roy, NIPS 2013]. We further extend our algorithm to the stochastic reward setting and obtain similar results.

Replacements for Tue, 18 Feb 20

[25]  arXiv:1205.3102 (replaced) [pdf, other]
Title: Symmetric nonnegative forms and sums of squares
Comments: (v4) minor revision and small reorganization
Subjects: Optimization and Control (math.OC); Algebraic Geometry (math.AG)
[26]  arXiv:1608.01879 (replaced) [pdf, other]
Title: On the analysis of inexact augmented Lagrangian schemes for misspecified conic convex programs
Comments: This version includes a new dual convergence result, and a clean and verifiable sufficiency condition for ensuring upper-Lipschitz continuity of AL subproblem solution set (Assumption 1.iii)
Subjects: Optimization and Control (math.OC)
[27]  arXiv:1810.05217 (replaced) [pdf, other]
Title: Stochastic reachability of a target tube: Theory and computation
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[28]  arXiv:1902.01048 (replaced) [pdf, ps, other]
Title: Average cost optimal control under weak ergodicity hypotheses: Relative value iterations
Comments: 23 pages
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[29]  arXiv:1903.07391 (replaced) [pdf, other]
Title: Some fundamental properties on the sampling free nabla Laplace transform
Subjects: Optimization and Control (math.OC); Signal Processing (eess.SP)
[30]  arXiv:1905.11957 (replaced) [pdf, other]
Title: Sample Complexity of Sample Average Approximation for Conditional Stochastic Optimization
Comments: Typo corrected. Reference added. Revision comments handled
Subjects: Optimization and Control (math.OC); Machine Learning (stat.ML)
[31]  arXiv:1905.13278 (replaced) [pdf, other]
Title: A Stochastic Derivative Free Optimization Method with Momentum
Subjects: Optimization and Control (math.OC)
[32]  arXiv:1906.09483 (replaced) [pdf, other]
Title: Feasible Path Identification in Optimal Power Flow with Sequential Convex Restriction
Subjects: Optimization and Control (math.OC)
[33]  arXiv:1910.05211 (replaced) [pdf, ps, other]
Title: The Minimal Abstract Robust Subdifferential
Authors: M.D. Voisei
Subjects: Optimization and Control (math.OC)
[34]  arXiv:1910.12999 (replaced) [pdf, other]
Title: A Decentralized Parallel Algorithm for Training Generative Adversarial Nets
Comments: A short version of this paper was accepted by NeurIPS Smooth Games Optimization and Machine Learning Workshop: bridging game theory and deep learning, 2019
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
[35]  arXiv:1911.07363 (replaced) [pdf, ps, other]
Title: Optimal Decentralized Distributed Algorithms for Stochastic Convex Optimization
Subjects: Optimization and Control (math.OC)
[36]  arXiv:2001.09013 (replaced) [pdf, ps, other]
Title: Inexact Relative Smoothness and Strong Convexity for Optimization and Variational Inequalities by Inexact Model
Comments: arXiv admin note: text overlap with arXiv:1902.00990
Subjects: Optimization and Control (math.OC)
[37]  arXiv:2001.09436 (replaced) [pdf, ps, other]
Title: Weakly Homogeneous Optimization Problems
Authors: Vu Trung Hieu
Comments: 12 pages
Subjects: Optimization and Control (math.OC)
[38]  arXiv:2002.04130 (replaced) [pdf, other]
Title: On Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
[39]  arXiv:1901.05294 (replaced) [pdf, other]
Title: Design of generalized fractional order gradient descent method
Comments: 8 pages, 16 figures
Subjects: Signal Processing (eess.SP); Optimization and Control (math.OC)
[40]  arXiv:1909.12292 (replaced) [pdf, ps, other]
Title: Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[41]  arXiv:1910.01619 (replaced) [pdf, ps, other]
Title: Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks
Authors: Yu Bai, Jason D. Lee
Comments: Published at ICLR 2020
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[42]  arXiv:1910.04378 (replaced) [pdf, other]
Title: A Semi-Definite Programming Approach to Robust Adaptive MPC under State Dependent Uncertainty
Comments: Accepted for European Control Conference (ECC), May 2020, Saint Petersburg, Russia
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
[43]  arXiv:1910.06378 (replaced) [pdf, other]
Title: SCAFFOLD: Stochastic Controlled Averaging for Federated Learning
Comments: V2 contains analysis of FedAvg, non-convex rates of Scaffold, and experimental evaluation
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC); Machine Learning (stat.ML)
[44]  arXiv:1911.02681 (replaced) [pdf, other]
Title: Generalized Transformation-based Gradient
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[45]  arXiv:1911.03432 (replaced) [pdf, other]
Title: Penalty Method for Inversion-Free Deep Bilevel Optimization
Comments: 17 Pages, 7 figures
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[46]  arXiv:1912.01825 (replaced) [pdf, other]
Title: A Machine Learning Framework for Solving High-Dimensional Mean Field Game and Mean Field Control Problems
Comments: 21 pages, 13 figures, 2 table
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC); Machine Learning (stat.ML)
[ total of 46 entries: 1-46 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, math, recent, 2002, contact, help  (Access key information)