We gratefully acknowledge support from
the Simons Foundation and member institutions.

Machine Learning

New submissions

[ total of 32 entries: 1-32 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 21 Jan 22

[1]  arXiv:2201.07998 [pdf]
Title: Statistical Learning for Individualized Asset Allocation
Subjects: Machine Learning (stat.ML); Statistics Theory (math.ST)

We establish a high-dimensional statistical learning framework for individualized asset allocation. Our proposed methodology addresses continuous-action decision-making with a large number of characteristics. We develop a discretization approach to model the effect from continuous actions and allow the discretization level to be large and diverge with the number of observations. The value function of continuous-action is estimated using penalized regression with generalized penalties that are imposed on linear transformations of the model coefficients. We show that our estimators using generalized folded concave penalties enjoy desirable theoretical properties and allow for statistical inference of the optimal value associated with optimal decision-making. Empirically, the proposed framework is exercised with the Health and Retirement Study data in finding individualized optimal asset allocation. The results show that our individualized optimal strategy improves individual financial well-being and surpasses benchmark strategies.

[2]  arXiv:2201.08082 [pdf, other]
Title: Kernel Methods and Multi-layer Perceptrons Learn Linear Models in High Dimensions
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Empirical observation of high dimensional phenomena, such as the double descent behaviour, has attracted a lot of interest in understanding classical techniques such as kernel methods, and their implications to explain generalization properties of neural networks. Many recent works analyze such models in a certain high-dimensional regime where the covariates are independent and the number of samples and the number of covariates grow at a fixed ratio (i.e. proportional asymptotics). In this work we show that for a large class of kernels, including the neural tangent kernel of fully connected networks, kernel methods can only perform as well as linear models in this regime. More surprisingly, when the data is generated by a kernel model where the relationship between input and the response could be very nonlinear, we show that linear models are in fact optimal, i.e. linear models achieve the minimum risk among all models, linear or nonlinear. These results suggest that more complex models for the data other than independent features are needed for high-dimensional analysis.

[3]  arXiv:2201.08226 [pdf, other]
Title: Sketch-and-Lift: Scalable Subsampled Semidefinite Program for $K$-means Clustering
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Semidefinite programming (SDP) is a powerful tool for tackling a wide range of computationally hard problems such as clustering. Despite the high accuracy, semidefinite programs are often too slow in practice with poor scalability on large (or even moderate) datasets. In this paper, we introduce a linear time complexity algorithm for approximating an SDP relaxed $K$-means clustering. The proposed sketch-and-lift (SL) approach solves an SDP on a subsampled dataset and then propagates the solution to all data points by a nearest-centroid rounding procedure. It is shown that the SL approach enjoys a similar exact recovery threshold as the $K$-means SDP on the full dataset, which is known to be information-theoretically tight under the Gaussian mixture model. The SL method can be made adaptive with enhanced theoretic properties when the cluster sizes are unbalanced. Our simulation experiments demonstrate that the statistical accuracy of the proposed method outperforms state-of-the-art fast clustering algorithms without sacrificing too much computational efficiency, and is comparable to the original $K$-means SDP with substantially reduced runtime.

[4]  arXiv:2201.08283 [pdf, other]
Title: Lead-lag detection and network clustering for multivariate time series with an application to the US equity market
Comments: 29 pages, 28 figures; preliminary version appeared at KDD 2021 - 7th SIGKKDD Workshop on Mining and Learning from Time Series (MiLeTS)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistical Finance (q-fin.ST); Methodology (stat.ME)

In multivariate time series systems, it has been observed that certain groups of variables partially lead the evolution of the system, while other variables follow this evolution with a time delay; the result is a lead-lag structure amongst the time series variables. In this paper, we propose a method for the detection of lead-lag clusters of time series in multivariate systems. We demonstrate that the web of pairwise lead-lag relationships between time series can be helpfully construed as a directed network, for which there exist suitable algorithms for the detection of pairs of lead-lag clusters with high pairwise imbalance. Within our framework, we consider a number of choices for the pairwise lead-lag metric and directed network clustering components. Our framework is validated on both a synthetic generative model for multivariate lead-lag time series systems and daily real-world US equity prices data. We showcase that our method is able to detect statistically significant lead-lag clusters in the US equity market. We study the nature of these clusters in the context of the empirical finance literature on lead-lag relations and demonstrate how these can be used for the construction of predictive financial signals.

[5]  arXiv:2201.08311 [pdf, other]
Title: Accelerated Gradient Flow: Risk, Stability, and Implicit Regularization
Authors: Yue Sheng, Alnur Ali
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)

Acceleration and momentum are the de facto standard in modern applications of machine learning and optimization, yet the bulk of the work on implicit regularization focuses instead on unaccelerated methods. In this paper, we study the statistical risk of the iterates generated by Nesterov's accelerated gradient method and Polyak's heavy ball method, when applied to least squares regression, drawing several connections to explicit penalization. We carry out our analyses in continuous-time, allowing us to make sharper statements than in prior work, and revealing complex interactions between early stopping, stability, and the curvature of the loss function.

[6]  arXiv:2201.08315 [pdf, other]
Title: Predictive Inference with Weak Supervision
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The expense of acquiring labels in large-scale statistical machine learning makes partially and weakly-labeled data attractive, though it is not always apparent how to leverage such data for model fitting or validation. We present a methodology to bridge the gap between partial supervision and validation, developing a conformal prediction framework to provide valid predictive confidence sets -- sets that cover a true label with a prescribed probability, independent of the underlying distribution -- using weakly labeled data. To do so, we introduce a (necessary) new notion of coverage and predictive validity, then develop several application scenarios, providing efficient algorithms for classification and several large-scale structured prediction problems. We corroborate the hypothesis that the new coverage definition allows for tighter and more informative (but valid) confidence sets through several experiments.

Cross-lists for Fri, 21 Jan 22

[7]  arXiv:2201.07401 (cross-list from math.ST) [pdf, other]
Title: Multiway Spherical Clustering via Degree-Corrected Tensor Block Models
Subjects: Statistics Theory (math.ST); Machine Learning (stat.ML)

We consider the problem of multiway clustering in the presence of unknown degree heterogeneity. Such data problems arise commonly in applications such as recommendation system, neuroimaging, community detection, and hypergraph partitions in social networks. The allowance of degree heterogeneity provides great flexibility in clustering models, but the extra complexity poses significant challenges in both statistics and computation. Here, we develop a degree-corrected tensor block model with estimation accuracy guarantees. We present the phase transition of clustering performance based on the notion of angle separability, and we characterize three signal-to-noise regimes corresponding to different statistical-computational behaviors. In particular, we demonstrate that an intrinsic statistical-to-computational gap emerges only for tensors of order three or greater. Further, we develop an efficient polynomial-time algorithm that provably achieves exact clustering under mild signal conditions. The efficacy of our procedure is demonstrated through two data applications, one on human brain connectome project, and another on Peru Legislation network dataset.

[8]  arXiv:2201.07912 (cross-list from cs.LG) [pdf, other]
Title: Communication-Efficient Device Scheduling for Federated Learning Using Stochastic Optimization
Comments: To be included in Proceedings of INFOCOM 2022, 10 Pages, 5 Figures
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Machine Learning (stat.ML)

Federated learning (FL) is a useful tool in distributed machine learning that utilizes users' local datasets in a privacy-preserving manner. When deploying FL in a constrained wireless environment; however, training models in a time-efficient manner can be a challenging task due to intermittent connectivity of devices, heterogeneous connection quality, and non-i.i.d. data. In this paper, we provide a novel convergence analysis of non-convex loss functions using FL on both i.i.d. and non-i.i.d. datasets with arbitrary device selection probabilities for each round. Then, using the derived convergence bound, we use stochastic optimization to develop a new client selection and power allocation algorithm that minimizes a function of the convergence bound and the average communication time under a transmit power constraint. We find an analytical solution to the minimization problem. One key feature of the algorithm is that knowledge of the channel statistics is not required and only the instantaneous channel state information needs to be known. Using the FEMNIST and CIFAR-10 datasets, we show through simulations that the communication time can be significantly decreased using our algorithm, compared to uniformly random participation.

[9]  arXiv:2201.08105 (cross-list from cs.LG) [pdf, other]
Title: Statistical Depth Functions for Ranking Distributions: Definitions, Statistical Learning and Applications
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The concept of median/consensus has been widely investigated in order to provide a statistical summary of ranking data, i.e. realizations of a random permutation $\Sigma$ of a finite set, $\{1,\; \ldots,\; n\}$ with $n\geq 1$ say. As it sheds light onto only one aspect of $\Sigma$'s distribution $P$, it may neglect other informative features. It is the purpose of this paper to define analogs of quantiles, ranks and statistical procedures based on such quantities for the analysis of ranking data by means of a metric-based notion of depth function on the symmetric group. Overcoming the absence of vector space structure on $\mathfrak{S}_n$, the latter defines a center-outward ordering of the permutations in the support of $P$ and extends the classic metric-based formulation of consensus ranking (medians corresponding then to the deepest permutations). The axiomatic properties that ranking depths should ideally possess are listed, while computational and generalization issues are studied at length. Beyond the theoretical analysis carried out, the relevance of the novel concepts and methods introduced for a wide variety of statistical tasks are also supported by numerous numerical experiments.

[10]  arXiv:2201.08115 (cross-list from cs.AI) [pdf, other]
Title: Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO); Machine Learning (stat.ML)

The ability to discover behaviours from past experience and transfer them to new tasks is a hallmark of intelligent agents acting sample-efficiently in the real world. Equipping embodied reinforcement learners with the same ability may be crucial for their successful deployment in robotics. While hierarchical and KL-regularized RL individually hold promise here, arguably a hybrid approach could combine their respective benefits. Key to these fields is the use of information asymmetry to bias which skills are learnt. While asymmetric choice has a large influence on transferability, prior works have explored a narrow range of asymmetries, primarily motivated by intuition. In this paper, we theoretically and empirically show the crucial trade-off, controlled by information asymmetry, between the expressivity and transferability of skills across sequential tasks. Given this insight, we provide a principled approach towards choosing asymmetry and apply our approach to a complex, robotic block stacking domain, unsolvable by baselines, demonstrating the effectiveness of hierarchical KL-regularized RL, coupled with correct asymmetric choice, for sample-efficient transfer learning.

[11]  arXiv:2201.08262 (cross-list from cs.LG) [pdf, other]
Title: Generalizing Off-Policy Evaluation From a Causal Perspective For Sequential Decision-Making
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Assessing the effects of a policy based on observational data from a different policy is a common problem across several high-stake decision-making domains, and several off-policy evaluation (OPE) techniques have been proposed. However, these methods largely formulate OPE as a problem disassociated from the process used to generate the data (i.e. structural assumptions in the form of a causal graph). We argue that explicitly highlighting this association has important implications on our understanding of the fundamental limits of OPE. First, this implies that current formulation of OPE corresponds to a narrow set of tasks, i.e. a specific causal estimand which is focused on prospective evaluation of policies over populations or sub-populations. Second, we demonstrate how this association motivates natural desiderata to consider a general set of causal estimands, particularly extending the role of OPE for counterfactual off-policy evaluation at the level of individuals of the population. A precise description of the causal estimand highlights which OPE estimands are identifiable from observational data under the stated generative assumptions. For those OPE estimands that are not identifiable, the causal perspective further highlights where more experimental data is necessary, and highlights situations where human expertise can aid identification and estimation. Furthermore, many formalisms of OPE overlook the role of uncertainty entirely in the estimation process.We demonstrate how specifically characterising the causal estimand highlights the different sources of uncertainty and when human expertise can naturally manage this uncertainty. We discuss each of these aspects as actionable desiderata for future OPE research at scale and in-line with practical utility.

[12]  arXiv:2201.08326 (cross-list from stat.ME) [pdf, other]
Title: Learning with latent group sparsity via heat flow dynamics on networks
Comments: 36 pages, 3 figures, 3 tables
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Econometrics (econ.EM); Statistics Theory (math.ST); Computation (stat.CO); Machine Learning (stat.ML)

Group or cluster structure on explanatory variables in machine learning problems is a very general phenomenon, which has attracted broad interest from practitioners and theoreticians alike. In this work we contribute an approach to learning under such group structure, that does not require prior information on the group identities. Our paradigm is motivated by the Laplacian geometry of an underlying network with a related community structure, and proceeds by directly incorporating this into a penalty that is effectively computed via a heat flow-based local network dynamics. In fact, we demonstrate a procedure to construct such a network based on the available data. Notably, we dispense with computationally intensive pre-processing involving clustering of variables, spectral or otherwise. Our technique is underpinned by rigorous theorems that guarantee its effective performance and provide bounds on its sample complexity. In particular, in a wide range of settings, it provably suffices to run the heat flow dynamics for time that is only logarithmic in the problem dimensions. We explore in detail the interfaces of our approach with key statistical physics models in network science, such as the Gaussian Free Field and the Stochastic Block Model. We validate our approach by successful applications to real-world data from a wide array of application domains, including computer science, genetics, climatology and economics. Our work raises the possibility of applying similar diffusion-based techniques to classical learning tasks, exploiting the interplay between geometric, dynamical and stochastic structures underlying the data.

[13]  arXiv:2201.08343 (cross-list from stat.ME) [pdf, other]
Title: Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)

Conjoint analysis is a popular experimental design used to measure multidimensional preferences. Researchers examine how varying a factor of interest, while controlling for other relevant factors, influences decision-making. Currently, there exist two methodological approaches to analyzing data from a conjoint experiment. The first focuses on estimating the average marginal effects of each factor while averaging over the other factors. Although this allows for straightforward design-based estimation, the results critically depend on the distribution of other factors and how interaction effects are aggregated. An alternative model-based approach can compute various quantities of interest, but requires researchers to correctly specify the model, a challenging task for conjoint analysis with many factors and possible interactions. In addition, a commonly used logistic regression has poor statistical properties even with a moderate number of factors when incorporating interactions. We propose a new hypothesis testing approach based on the conditional randomization test to answer the most fundamental question of conjoint analysis: Does a factor of interest matter in any way given the other factors? Our methodology is solely based on the randomization of factors, and hence is free from assumptions. Yet, it allows researchers to use any test statistic, including those based on complex machine learning algorithms. As a result, we are able to combine the strengths of the existing design-based and model-based approaches. We illustrate the proposed methodology through conjoint analysis of immigration preferences and political candidate evaluation. We also extend the proposed approach to test for regularity assumptions commonly used in conjoint analysis.

[14]  arXiv:2201.08349 (cross-list from math.ST) [pdf, ps, other]
Title: Heavy-tailed Sampling via Transformed Unadjusted Langevin Algorithm
Subjects: Statistics Theory (math.ST); Computation (stat.CO); Machine Learning (stat.ML)

We analyze the oracle complexity of sampling from polynomially decaying heavy-tailed target densities based on running the Unadjusted Langevin Algorithm on certain transformed versions of the target density. The specific class of closed-form transformation maps that we construct are shown to be diffeomorphisms, and are particularly suited for developing efficient diffusion-based samplers. We characterize the precise class of heavy-tailed densities for which polynomial-order oracle complexities (in dimension and inverse target accuracy) could be obtained, and provide illustrative examples. We highlight the relationship between our assumptions and functional inequalities (super and weak Poincar\'e inequalities) based on non-local Dirichlet forms defined via fractional Laplacian operators, used to characterize the heavy-tailed equilibrium densities of certain stable-driven stochastic differential equations.

Replacements for Fri, 21 Jan 22

[15]  arXiv:1802.02219 (replaced) [pdf, other]
Title: Practical Transfer Learning for Bayesian Optimization
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI)
[16]  arXiv:1903.09668 (replaced) [pdf, ps, other]
Title: Data Augmentation for Bayesian Deep Learning
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
[17]  arXiv:2003.00470 (replaced) [pdf]
Title: Dimensionality reduction to maximize prediction generalization capability
Journal-ref: Nature Machine Intelligence 3, 434-446 (2021)
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[18]  arXiv:2106.01282 (replaced) [pdf, other]
Title: Spectral embedding for dynamic networks with stability guarantees
Comments: NeurIPS 2021
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[19]  arXiv:2109.14206 (replaced) [pdf, other]
Title: Exact Statistical Inference for the Wasserstein Distance by Selective Inference
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[20]  arXiv:2111.14000 (replaced) [pdf, other]
Title: Factor-augmented tree ensembles
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Econometrics (econ.EM)
[21]  arXiv:2201.06616 (replaced) [pdf, other]
Title: Improving the quality control of seismic data through active learning
Comments: 10 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[22]  arXiv:1902.09602 (replaced) [pdf, other]
Title: Analyzing Data Selection Techniques with Tools from the Theory of Information Losses
Comments: This paper has now been published as a conference proceeding in IEEE Big Data 2021
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[23]  arXiv:2007.08911 (replaced) [pdf, other]
Title: Technologies for Trustworthy Machine Learning: A Survey in a Socio-Technical Context
Comments: We are updating some sections to include more recent advances
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computers and Society (cs.CY); Machine Learning (stat.ML)
[24]  arXiv:2009.01235 (replaced) [pdf, other]
Title: Quantum Discriminator for Binary Classification
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Machine Learning (stat.ML)
[25]  arXiv:2103.10027 (replaced) [pdf, other]
Title: Probabilistic Simplex Component Analysis
Subjects: Signal Processing (eess.SP); Machine Learning (stat.ML)
[26]  arXiv:2109.01654 (replaced) [pdf, other]
Title: Multi-agent Natural Actor-critic Reinforcement Learning Algorithms
Comments: A very high-level summary of our revision is: In Section 3.5, we theoretically prove that the objective function value from the deterministic variant of MAN algorithms dominates that of the MAAC algorithm under some minimal conditions. It relies on the Lemma 2 of our paper: the minimum singular value of the Fisher information matrix is well within the reciprocal of the policy parameter dimension
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC); Machine Learning (stat.ML)
[27]  arXiv:2109.01785 (replaced) [pdf, other]
Title: Node Feature Kernels Increase Graph Convolutional Network Robustness
Comments: 16 pages, 5 figures
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
[28]  arXiv:2110.02128 (replaced) [pdf, other]
Title: NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL
Comments: Accepted for publication in NeurIPS 2021
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[29]  arXiv:2110.06623 (replaced) [pdf, other]
Title: SSSNET: Semi-Supervised Signed Network Clustering
Comments: 14 pages
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[30]  arXiv:2110.12399 (replaced) [pdf, other]
Title: BINAS: Bilinear Interpretable Neural Architecture Search
Comments: The full code is released at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC); Machine Learning (stat.ML)
[31]  arXiv:2112.07602 (replaced) [pdf, other]
Title: A Framework for the Meta-Analysis of Randomized Experiments with Applications to Heavy-Tailed Response Data
Subjects: Methodology (stat.ME); Applications (stat.AP); Machine Learning (stat.ML)
[32]  arXiv:2112.07611 (replaced) [pdf, other]
Title: Speeding up Learning Quantum States through Group Equivariant Convolutional Quantum Ansätze
Comments: 16 pages, 12 figures
Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Mathematical Physics (math-ph); Machine Learning (stat.ML)
[ total of 32 entries: 1-32 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2201, contact, help  (Access key information)