Optimization and Control
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Tue, 18 Feb 20
 [1] arXiv:2002.06281 [pdf, ps, other]

Title: A Robust Traffic Control Model Considering Uncertainties in Turning RatiosSubjects: Optimization and Control (math.OC)
The effects of uncertainties in model parameters on traffic flow control have recently drawn much research attention. Although certain parameters, such as capacity, initial densities, have been studied, the uncertainties in turning ratios have received few efforts. To fill this gap, this paper proposed a robust control model to deal with the uncertainties in the turning ratio by using distributionally robust chance constraints. The model offers an optimal solution over all possible distributions in accordance with given prior knowledge. Then, we apply this robust model on both a highway network and an urban network, and study the interactions between the uncertainties and the control inputs of the entire network.
 [2] arXiv:2002.06309 [pdf, other]

Title: Stochastic optimization over proximally smooth setsSubjects: Optimization and Control (math.OC)
We introduce a class of stochastic algorithms for minimizing weakly convex functions over proximally smooth sets. As their main building blocks, the algorithms use simplified models of the objective function and the constraint set, along with a retraction operation to restore feasibility. All the proposed methods come equipped with a finite time efficiency guarantee in terms of a natural stationarity measure. We discuss consequences for nonsmooth optimization over smooth manifolds and over sets cut out by weaklyconvex inequalities.
 [3] arXiv:2002.06315 [pdf, other]

Title: Bregman Augmented Lagrangian and Its AccelerationComments: 25 pages, 2 figuresSubjects: Optimization and Control (math.OC)
We study the Bregman Augmented Lagrangian method (BALM) for solving convex problems with linear constraints. For classical Augmented Lagrangian method, the convergence rate and its relation with the proximal point method is wellunderstood. However, the convergence rate for BALM has not yet been thoroughly studied in the literature. In this paper, we analyze the convergence rates of BALM in terms of the primal objective as well as the feasibility violation. We also develop, for the first time, an accelerated Bregman proximal point method, that improves the convergence rate from $O(1/\sum_{k=0}^{T1}\eta_k)$ to $O(1/(\sum_{k=0}^{T1}\sqrt{\eta_k})^2)$, where $\{\eta_k\}_{k=0}^{T1}$ is the sequence of proximal parameters. When applied to the dual of linearly constrained convex programs, this leads to the construction of an accelerated BALM, that achieves the improved rates for both primal and dual convergences.
 [4] arXiv:2002.06632 [pdf, ps, other]

Title: Passive Linear DiscreteTime Systems  Characterization through StructureAuthors: Izchak LewkowiczSubjects: Optimization and Control (math.OC); Functional Analysis (math.FA)
We here show that discretetime passive linear systems are intimately linked to the structure of maximal, matrixconvex sets, closed under multiplication among their elements. Moreover, this observation unifies three setups: (i) difference inclusions, (ii) matrixvalued rational functions, (iii) realization arrays associated with rational functions. It turns out that in the continuoustime case, the associated structure is if of maximal matrixconvex, cones, closed under inversion.
 [5] arXiv:2002.06649 [pdf, ps, other]

Title: Frequency regulation with thermostatically controlled loads: aggregation of dynamics and synchronizationComments: 10 pages, 4 figures, 1 tableSubjects: Optimization and Control (math.OC)
Thermostatically controlled loads (TCLs) can provide ancillary services to the power network by aiding existing frequency control mechanisms. These loads are, however, characterized by an intrinsic limit cycle behavior which raises the risk that these could synchronize when coupled with the frequency dynamics of the power grid, i.e. simultaneously switch, inducing persistent and possibly catastrophic power oscillations. Control schemes with randomization in the control policy have been proposed in the literature to address this problem. However, such stochastic schemes introduce delays in the response of TCLs that may limit their ability to provide support at urgencies. In this paper, we present a deterministic control mechanism for TCLs such that those switch when prescribed frequency thresholds are exceeded in order to provide ancillary services to the power network. For the considered scheme, we propose appropriate conditions for the design of the frequency thresholds that bound the coupling between frequency and TCL dynamics, so as to avoid synchronization. In particular, we show that as the number of loads tends to infinity, there exist arbitrarily long time intervals where the frequency deviations are arbitrarily small. Our analytical results are verified with simulations on the Northeast Power Coordinating Council (NPCC) 140bus system, which demonstrate that the proposed scheme offers significantly improved frequency response in comparison with conventional implementations and existing stochastic schemes.
 [6] arXiv:2002.06731 [pdf, other]

Title: A twostage algorithm for aircraft conflict resolution with trajectory recoverySubjects: Optimization and Control (math.OC); Discrete Mathematics (cs.DM)
As air traffic volume is continuously increasing, it has become a priority to improve traffic control algorithms to handle future air travel demand and improve airspace capacity. We address the conflict resolution problem in air traffic control using a novel approach for aircraft collision avoidance with trajectory recovery. We present a twostage algorithm that first solves all initial conflicts by adjusting aircraft headings and speeds, before identifying the optimal time for aircraft to recover towards their target destination. The collision avoidance stage extends an existing mixedinteger programming formulation to heading control. For the trajectory recovery stage, we introduce a novel exact mixedinteger programming formulation as well as a greedy heuristic algorithm. The proposed twostage approach guarantees that all trajectories during both the collision avoidance and recovery stages are conflictfree. Numerical results on benchmark problems show that the proposed heuristic for trajectory recovery is competitive while also emphasizing the difficulty of this optimization problem. The proposed approach can be used as a decisionsupport tool for introducing automation in air traffic control.
 [7] arXiv:2002.06793 [pdf]

Title: Optimal BESS Allocation in Large Transmission Networks Using Linearized BESS ModelsComments: Accepted for presentation, and will be published in the Proceedings of the 2020 IEEE PES General Meeting, August 26 2020, Montreal, Quebec, CanadaSubjects: Optimization and Control (math.OC)
The most commonly used model for battery energy storage systems (BESSs) in optimal BESS allocation problems is a constantefficiency model. However, the charging and discharging efficiencies of BESSs vary nonlinearly as functions of their stateofcharge, temperature, charging/discharging powers, as well as the BESS technology being considered. Therefore, constantefficiency models may inaccurately represent the nonlinear operating characteristics of the BESS. In this paper, we first create technologyspecific linearized BESS models derived from the actual nonlinear BESS models. We then incorporate the linearized BESS models into a mixedinteger linear programming framework for optimal multitechnology BESS allocation. Studies carried out on a 2,604bus U.S. transmission network demonstrate the benefits of utilizing the linearized BESS models from the model accuracy, convexity, and computational performance viewpoints.
 [8] arXiv:2002.06822 [pdf, ps, other]

Title: Lyapunov characterization of uniform exponential stability for nonlinear infinitedimensional systemsAuthors: Ihab Haidar (Quartz), Yacine Chitour (L2S), Paolo Mason (L2S, CNRS), Mario Sigalotti (Inria, CaGE, LJLL (UMR\_7598))Subjects: Optimization and Control (math.OC)
In this paper we deal with infinitedimensional nonlinear forward complete dynamical systems which are subject to external disturbances. We first extend the wellknown Datko lemma to the framework of the considered class of systems. Thanks to this generalization, we provide characterizations of the uniform (with respect to disturbances) local, semiglobal, and global exponential stability, through the existence of coercive and noncoercive Lyapunov functionals. The importance of the obtained results is underlined through some applications concerning 1) exponential stability of nonlinear retarded systems with piecewise constant delays, 2) exponential stability preservation under sampling for semilinear control switching systems, and 3) the link between inputtostate stability and exponential stability of semilinear switching systems.
 [9] arXiv:2002.06848 [pdf, other]

Title: SingCubic: Cyclic Incremental Newtontype Gradient Descent with Cubic Regularization for NonConvex OptimizationAuthors: Ziqiang ShiSubjects: Optimization and Control (math.OC)
In this work, we generalized and unified two recent completely different works of~\cite{shi2015large} and~\cite{cartis2012adaptive} respectively into one by proposing the cyclic incremental Newtontype gradient descent with cubic regularization (SingCubic) method for optimizing nonconvex functions. Through the iterations of SingCubic, a cubic regularized global quadratic approximation using Hessian information is kept and solved. Preliminary numerical experiments show the encouraging performance of the SingCubic algorithm when compared to basic incremental or stochastic Newtontype implementations. The results and technique can be served as an initiate for the research on the incremental Newtontype gradient descent methods that employ cubic regularization. The methods and principles proposed in this paper can be used to do logistic regression, autoencoder training, independent components analysis, Ising model/Hopfield network training, multilayer perceptron, deep convolutional network training and so on. We will opensource parts of our implementations soon.
 [10] arXiv:2002.06952 [pdf, ps, other]

Title: Closedloop Equilibrium for TimeInconsistent McKeanVlasov Controlled ProblemSubjects: Optimization and Control (math.OC)
The paper deals with a class of timeinconsistent control problems for McKeanVlasov dynamics. By solving a backward timeinconsistent HamiltonJacobiBellman (HJB for short) equation coupled with a forward distributiondependent stochastic differential equation, we investigate the existence and uniqueness of a closedloop equilibrium for such timeinconsistent distributiondependent control problem. Moreover, a special case of semilinear McKeanVlasov dynamics with a quadratictype cost functional is considered due to its special structure.
 [11] arXiv:2002.07003 [pdf, other]

Title: A Newton FrankWolfe Method for Constrained SelfConcordant MinimizationSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
We demonstrate how to scalably solve a class of constrained selfconcordant minimization problems using linear minimization oracles (LMO) over the constraint set. We prove that the number of LMO calls of our method is nearly the same as that of the FrankWolfe method in the Lsmooth case. Specifically, our Newton FrankWolfe method uses $\mathcal{O}(\epsilon^{\nu})$ LMO's, where $\epsilon$ is the desired accuracy and $\nu:= 1 + o(1)$. In addition, we demonstrate how our algorithm can exploit the improved variants of the LMObased schemes, including awaysteps, to attain linear convergence rates. We also provide numerical evidence with portfolio design with the competitive ratio, Doptimal experimental design, and logistic regression with the elastic net where Newton FrankWolfe outperforms the stateoftheart.
 [12] arXiv:2002.07021 [pdf, ps, other]

Title: An Efficient Robust Approach to the Dayahead Operation of an Aggregator of Electric VehiclesComments: 8 pages, 4 figuresSubjects: Optimization and Control (math.OC)
The growing use of electric vehicles (EVs) may hinder their integration into the electricity system as well as their efficient operation due to the intrinsic stochasticity associated with their driving patterns. In this work, we assume a profitmaximizer EVaggregator who participates in the dayahead electricity market. The aggregator accounts for the technical aspects of each individual EV and the uncertainty in its driving patterns. We propose a hierarchical optimization approach to represent the decisionmaking of this aggregator. The upper level models the profitmaximizer aggregator's decisions on the EVfleet operation, while a series of lowerlevel problems computes the worstcase EV availability profiles in terms of battery draining and energy exchange with the market. Then, this problem can be equivalently transformed into a mixedinteger linear singlelevel equivalent given the totally unimodular character of the constraint matrices of the lowerlevel problems and their convexity. Finally, we thoroughly analyze the benefits of the hierarchical model compared to the results from stochastic and deterministic models.
Crosslists for Tue, 18 Feb 20
 [13] arXiv:2002.06273 (crosslist from math.AP) [pdf, ps, other]

Title: Collapsing and the convex hull property in a soap film capillarity modelComments: 13 pages, 3 figuresSubjects: Analysis of PDEs (math.AP); Mathematical Physics (mathph); Differential Geometry (math.DG); Optimization and Control (math.OC)
Soap films hanging from a wire frame are studied in the framework of capillarity theory. Minimizers in the corresponding variational problem are known to consist of positive volume regions with boundaries of constant mean curvature/pressure, possibly connected by "collapsed" minimal surfaces. We prove here that collapsing only occurs if the mean curvature/pressure of the bulky regions is negative, and that, when this last property holds, the whole soap film lies in the convex hull of its boundary wire frame.
 [14] arXiv:2002.06277 (crosslist from cs.LG) [pdf, other]

Title: A meanfield analysis of twoplayer zerosum gamesSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Probability (math.PR); Machine Learning (stat.ML)
Finding Nash equilibria in twoplayer zerosum continuous games is a central problem in machine learning, e.g. for training both GANs and robust models. The existence of pure Nash equilibria requires strong conditions which are not typically met in practice. Mixed Nash equilibria exist in greater generality and may be found using mirror descent. Yet this approach does not scale to high dimensions. To address this limitation, we parametrize mixed strategies as mixtures of particles, whose positions and weights are updated using gradient descentascent. We study this dynamics as an interacting gradient flow over measure spaces endowed with the WassersteinFisherRao metric. We establish global convergence to an approximate equilibrium for the related Langevin gradientascent dynamic. We prove a law of large numbers that relates particle dynamics to meanfield dynamics. Our method identifies mixed equilibria in high dimensions and is demonstrably effective for training mixtures of GANs.
 [15] arXiv:2002.06286 (crosslist from cs.LG) [pdf, ps, other]

Title: Nonasymptotic Convergence of Adamtype Reinforcement Learning Algorithms under Markovian SamplingSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Despite the wide applications of Adam in reinforcement learning (RL), the theoretical convergence of Adamtype RL algorithms has not been established. This paper provides the first such convergence analysis for two fundamental RL algorithms of policy gradient (PG) and temporal difference (TD) learning that incorporate AMSGrad updates (a standard alternative of Adam in theoretical analysis), referred to as PGAMSGrad and TDAMSGrad, respectively. Moreover, our analysis focuses on Markovian sampling for both algorithms. We show that under general nonlinear function approximation, PGAMSGrad with a constant stepsize converges to a neighborhood of a stationary point at the rate of $\mathcal{O}(1/T)$ (where $T$ denotes the number of iterations), and with a diminishing stepsize converges exactly to a stationary point at the rate of $\mathcal{O}(\log^2 T/\sqrt{T})$. Furthermore, under linear function approximation, TDAMSGrad with a constant stepsize converges to a neighborhood of the global optimum at the rate of $\mathcal{O}(1/T)$, and with a diminishing stepsize converges exactly to the global optimum at the rate of $\mathcal{O}(\log T/\sqrt{T})$. Our study develops new techniques for analyzing the Adamtype RL algorithms under Markovian sampling.
 [16] arXiv:2002.06474 (crosslist from cs.NI) [pdf, other]

Title: Is Deadline Oblivious Scheduling Efficient for Controlling RealTime Traffic in Cellular Downlink Systems?Subjects: Networking and Internet Architecture (cs.NI); Optimization and Control (math.OC)
The emergence of bandwidthintensive latencycritical traffic in 5G Networks, such as Virtual Reality, has motivated interest in wireless resource allocation problems for flows with harddeadlines. Attempting to solve this problem brings about two challenges: (i) The flow arrival and the channel state are not known to the Base Station (BS) apriori, thus, the allocation decisions need to be made online. (ii) Wireless resource allocation algorithms that attempt to maximize a reward will likely be unfair, causing unacceptable service for some users. We model the problem as an online convex optimization problem. We propose a primaldual DeadlineOblivious (DO) algorithm, and show it is approximately 3.6competitive. Furthermore, we show via simulations that our algorithm tracks the prescient offline solution very closely, significantly outperforming several existing algorithms. In the second part, we impose a stochastic constraint on the allocation, requiring a guarantee that each user achieves a certain timely throughput (amount of traffic delivered within the deadline over a period of time). We propose the Longterm Fair Deadline Oblivious (LFDO) algorithm for that setup. We combine the Lyapunov framework with analysis of online algorithms, to show that LFDO retains the highperformance of DO, while satisfying the longterm stochastic constraints.
 [17] arXiv:2002.06694 (crosslist from stat.ML) [pdf, other]

Title: Structures of Spurious Local Minima in $k$meansSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)
$k$means clustering is a fundamental problem in unsupervised learning. The problem concerns finding a partition of the data points into $k$ clusters such that the withincluster variation is minimized. Despite its importance and wide applicability, a theoretical understanding of the $k$means problem has not been completely satisfactory. Existing algorithms with theoretical performance guarantees often rely on sophisticated (sometimes artificial) algorithmic techniques and restricted assumptions on the data. The main challenge lies in the nonconvex nature of the problem; in particular, there exist additional local solutions other than the global optimum. Moreover, the simplest and most popular algorithm for $k$means, namely Lloyd's algorithm, generally converges to such spurious local solutions both in theory and in practice.
In this paper, we approach the $k$means problem from a new perspective, by investigating the structures of these spurious local solutions under a probabilistic generative model with $k$ ground truth clusters. As soon as $k=3$, spurious local minima provably exist, even for wellseparated and balanced clusters. One such local minimum puts two centers at one true cluster, and the third center in the middle of the other two true clusters. For general $k$, one local minimum puts multiple centers at a true cluster, and one center in the middle of multiple true clusters. Perhaps surprisingly, we prove that this is essentially the only type of spurious local minima under a separation condition. Our results pertain to the $k$means formulation for mixtures of Gaussians or bounded distributions. Our theoretical results corroborate existing empirical observations and provide justification for several improved algorithms for $k$means clustering.  [18] arXiv:2002.06777 (crosslist from stat.CO) [pdf, other]

Title: Fitting ARMA Time Series Models without Identification: A Proximal ApproachSubjects: Computation (stat.CO); Optimization and Control (math.OC)
Fitting autoregressive moving average (ARMA) time series models requires model identification before parameter estimation. Model identification involves determining the order for the autoregressive and moving average components which is generally performed by visual inspection of the autocorrelation and partial autocorrelation functions, or by other offline methods. In many of today's big data regime applications of time series models, however, there is a need to model one or multiple streams of data in an iterative fashion. Hence, the offline model identification step is significantly prohibitive. In this work, we regularize the objective of the optimization behind the ARMA parameter estimation problem with a nonsmooth hierarchical sparsity inducing penalty based on two path graphs that allows incorporating the identification into the estimation step. A proximal block coordinate descent algorithm is then proposed to solve the underlying optimization problem. The resulting model satisfies the required stationarity and invertibility conditions for ARMA models. Numerical results supporting the proposed method are presented.
 [19] arXiv:2002.06849 (crosslist from physics.socph) [pdf, other]

Title: Gerrymandering and fair districting in parallel voting systemsSubjects: Physics and Society (physics.socph); Optimization and Control (math.OC)
Switching from one electoral system to another one is frequently criticized by the opposition and is viewed as a means for the ruling party to stay in power. In particular, when the new electoral system is a parallel voting (or a singlemember district) system, the ruling party is usually suspected of a biased way of partitioning the state into electoral districts such that based on a priori knowledge it has more chances to win in a maximum possible number of districts. In this paper, we propose a new methodology for deciding whether a particular party benefits from a given districting map under a parallel voting system. As a part of our methodology, we formulate and solve several gerrymandering problems. We showcased the application of our approach to the Moldovan parliamentary elections of 2019. Our results suggest that contrary to the arguments of previous studies, there is no clear evidence to consider that the districting map used in those elections was unfair.
 [20] arXiv:2002.06874 (crosslist from cs.RO) [pdf, other]

Title: On sensingaware model predictive pathfollowing control for a reversing general 2trailer with a carlike tractorComments: IEEE International Conference on Robotics and Automation (ICRA), 2020Subjects: Robotics (cs.RO); Optimization and Control (math.OC)
The design of reliable pathfollowing controllers is a key ingredient for successful deployment of selfdriving vehicles. This controllerdesign problem is especially challenging for a general 2trailer with a carlike tractor due to the vehicle's structurally unstable jointangle kinematics in backward motion and the carlike tractor's curvature limitations which can cause the vehicle segments to fold and enter a jackknife state. Furthermore, optical sensors with a limited field of view have been proposed to solve the jointangle estimation problem online, which introduce additional restrictions on which vehicle states that can be reliably estimated. To incorporate these restrictions at the level of control, a model predictive pathfollowing controller is proposed. By taking the vehicle's physical and sensing limitations into account, it is shown in realworld experiments that the performance of the proposed pathfollowing controller in terms of suppressing disturbances and recovering from nontrivial initial states is significantly improved compared to a previously proposed solution where the constraints have been neglected.
 [21] arXiv:2002.07052 (crosslist from math.NA) [pdf, ps, other]

Title: Nearest $Ω$stable matrix via Riemannian optimizationSubjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
We study the problem of finding the nearest $\Omega$stable matrix to a certain matrix $A$, i.e., the nearest matrix with all its eigenvalues in a prescribed closed set $\Omega$. Distances are measured in the Frobenius norm. An important special case is finding the nearest Hurwitz or Schur stable matrix, which has applications in systems theory. We describe a reformulation of the task as an optimization problem on the Riemannian manifold of orthogonal (or unitary) matrices. The problem can then be solved using standard methods from the theory of Riemannian optimization. The resulting algorithm is remarkably fast on smallscale and mediumscale matrices, and returns directly a Schur factorization of the minimizer, sidestepping the numerical difficulties associated with eigenvalues with high multiplicity.
 [22] arXiv:2002.07066 (crosslist from cs.LG) [pdf, ps, other]

Title: Learning ZeroSum SimultaneousMove Markov Games Using Function Approximation and Correlated EquilibriumSubjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA); Optimization and Control (math.OC); Machine Learning (stat.ML)
We develop provably efficient reinforcement learning algorithms for twoplayer zerosum Markov games in which the two players simultaneously take actions. To incorporate function approximation, we consider a family of Markov games where the reward function and transition kernel possess a linear structure. Both the offline and online settings of the problems are considered. In the offline setting, we control both players and the goal is to find the Nash Equilibrium efficiently by minimizing the worstcase duality gap. In the online setting, we control a single player to play against an arbitrary opponent and the goal is to minimize the regret. For both settings, we propose an optimistic variant of the leastsquares minimax value iteration algorithm. We show that our algorithm is computationally efficient and provably achieves an $\tilde O(\sqrt{d^3 H^3 T})$ upper bound on the duality gap and regret, without requiring additional assumptions on the sampling model.
We highlight that our setting requires overcoming several new challenges that are absent in Markov decision processes or turnbased Markov games. In particular, to achieve optimism in simultaneousmove Marko games, we construct both upper and lower confidence bounds of the value function, and then compute the optimistic policy by solving a generalsum matrix game with these bounds as the payoff matrices. As finding the Nash Equilibrium of such a generalsum game is computationally hard, our algorithm instead solves for a Coarse Correlated Equilibrium (CCE), which can be obtained efficiently via linear programming. To our best knowledge, such a CCEbased scheme for implementing optimism has not appeared in the literature and might be of interest in its own right.  [23] arXiv:2002.07085 (crosslist from math.DS) [pdf, ps, other]

Title: Smallgain theorem for stability, cooperative control and distributed observation of infinite networksComments: arXiv admin note: text overlap with arXiv:1910.12746Subjects: Dynamical Systems (math.DS); Optimization and Control (math.OC)
Motivated by a paradigm shift towards a hyperconnected world, we develop a computationally tractable smallgain theorem for a network of infinitely many systems, termed as infinite networks. The proposed smallgain theorem addresses exponential inputtostate stability with respect to closed sets, which enables us to analyze diverse stability problems in a unified manner. The smallgain condition, expressed in terms of the spectral radius of a gain operator collecting all the information about the internal Lyapunov gains, can be numerically computed for a large class of systems in an efficient way. To demonstrate broad applicability of our smallgain theorem, we apply it to the stability analysis of infinite timevarying networks, to consensus in infiniteagent systems, as well as to the design of distributed observers for infinite networks.
 [24] arXiv:2002.07125 (crosslist from cs.LG) [pdf, ps, other]

Title: Agnostic Qlearning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample ComplexitySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
The current paper studies the problem of agnostic $Q$learning with function approximation in deterministic systems where the optimal $Q$function is approximable by a function in the class $\mathcal{F}$ with approximation error $\delta \ge 0$. We propose a novel recursionbased algorithm and show that if $\delta = O\left(\rho/\sqrt{\dim_E}\right)$, then one can find the optimal policy using $O\left(\dim_E\right)$ trajectories, where $\rho$ is the gap between the optimal $Q$value of the best actions and that of the secondbest actions and $\dim_E$ is the Eluder dimension of $\mathcal{F}$. Our result has two implications:
1) In conjunction with the lower bound in [Du et al., ICLR 2020], our upper bound suggests that the condition $\delta = \widetilde{\Theta}\left(\rho/\sqrt{\mathrm{dim}_E}\right)$ is necessary and sufficient for algorithms with polynomial sample complexity.
2) In conjunction with the lower bound in [Wen and Van Roy, NIPS 2013], our upper bound suggests that the sample complexity $\widetilde{\Theta}\left(\mathrm{dim}_E\right)$ is tight even in the agnostic setting.
Therefore, we settle the open problem on agnostic $Q$learning proposed in [Wen and Van Roy, NIPS 2013]. We further extend our algorithm to the stochastic reward setting and obtain similar results.
Replacements for Tue, 18 Feb 20
 [25] arXiv:1205.3102 (replaced) [pdf, other]

Title: Symmetric nonnegative forms and sums of squaresComments: (v4) minor revision and small reorganizationSubjects: Optimization and Control (math.OC); Algebraic Geometry (math.AG)
 [26] arXiv:1608.01879 (replaced) [pdf, other]

Title: On the analysis of inexact augmented Lagrangian schemes for misspecified conic convex programsComments: This version includes a new dual convergence result, and a clean and verifiable sufficiency condition for ensuring upperLipschitz continuity of AL subproblem solution set (Assumption 1.iii)Subjects: Optimization and Control (math.OC)
 [27] arXiv:1810.05217 (replaced) [pdf, other]

Title: Stochastic reachability of a target tube: Theory and computationSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
 [28] arXiv:1902.01048 (replaced) [pdf, ps, other]

Title: Average cost optimal control under weak ergodicity hypotheses: Relative value iterationsComments: 23 pagesSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
 [29] arXiv:1903.07391 (replaced) [pdf, other]

Title: Some fundamental properties on the sampling free nabla Laplace transformSubjects: Optimization and Control (math.OC); Signal Processing (eess.SP)
 [30] arXiv:1905.11957 (replaced) [pdf, other]
 [31] arXiv:1905.13278 (replaced) [pdf, other]

Title: A Stochastic Derivative Free Optimization Method with MomentumSubjects: Optimization and Control (math.OC)
 [32] arXiv:1906.09483 (replaced) [pdf, other]

Title: Feasible Path Identification in Optimal Power Flow with Sequential Convex RestrictionSubjects: Optimization and Control (math.OC)
 [33] arXiv:1910.05211 (replaced) [pdf, ps, other]

Title: The Minimal Abstract Robust SubdifferentialAuthors: M.D. VoiseiSubjects: Optimization and Control (math.OC)
 [34] arXiv:1910.12999 (replaced) [pdf, other]

Title: A Decentralized Parallel Algorithm for Training Generative Adversarial NetsComments: A short version of this paper was accepted by NeurIPS Smooth Games Optimization and Machine Learning Workshop: bridging game theory and deep learning, 2019Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
 [35] arXiv:1911.07363 (replaced) [pdf, ps, other]

Title: Optimal Decentralized Distributed Algorithms for Stochastic Convex OptimizationSubjects: Optimization and Control (math.OC)
 [36] arXiv:2001.09013 (replaced) [pdf, ps, other]

Title: Inexact Relative Smoothness and Strong Convexity for Optimization and Variational Inequalities by Inexact ModelAuthors: Fedor Stonyakin, Alexander Tyurin, Alexander Gasnikov, Pavel Dvurechensky, Artem Agafonov, Darina Dvinskikh, Dmitry Pasechnyuk, Sergei Artamonov, Victorya PiskunovaComments: arXiv admin note: text overlap with arXiv:1902.00990Subjects: Optimization and Control (math.OC)
 [37] arXiv:2001.09436 (replaced) [pdf, ps, other]

Title: Weakly Homogeneous Optimization ProblemsAuthors: Vu Trung HieuComments: 12 pagesSubjects: Optimization and Control (math.OC)
 [38] arXiv:2002.04130 (replaced) [pdf, other]

Title: On Complexity of Finding Stationary Points of Nonsmooth Nonconvex FunctionsSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
 [39] arXiv:1901.05294 (replaced) [pdf, other]

Title: Design of generalized fractional order gradient descent methodComments: 8 pages, 16 figuresSubjects: Signal Processing (eess.SP); Optimization and Control (math.OC)
 [40] arXiv:1909.12292 (replaced) [pdf, ps, other]

Title: Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networksSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [41] arXiv:1910.01619 (replaced) [pdf, ps, other]

Title: Beyond Linearization: On Quadratic and HigherOrder Approximation of Wide Neural NetworksComments: Published at ICLR 2020Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [42] arXiv:1910.04378 (replaced) [pdf, other]

Title: A SemiDefinite Programming Approach to Robust Adaptive MPC under State Dependent UncertaintyComments: Accepted for European Control Conference (ECC), May 2020, Saint Petersburg, RussiaSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
 [43] arXiv:1910.06378 (replaced) [pdf, other]

Title: SCAFFOLD: Stochastic Controlled Averaging for Federated LearningAuthors: Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha SureshComments: V2 contains analysis of FedAvg, nonconvex rates of Scaffold, and experimental evaluationSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [44] arXiv:1911.02681 (replaced) [pdf, other]

Title: Generalized Transformationbased GradientSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [45] arXiv:1911.03432 (replaced) [pdf, other]

Title: Penalty Method for InversionFree Deep Bilevel OptimizationComments: 17 Pages, 7 figuresSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [46] arXiv:1912.01825 (replaced) [pdf, other]

Title: A Machine Learning Framework for Solving HighDimensional Mean Field Game and Mean Field Control ProblemsComments: 21 pages, 13 figures, 2 tableSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC); Machine Learning (stat.ML)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, math, recent, 2002, contact, help (Access key information)