We gratefully acknowledge support from
the Simons Foundation and member institutions.

Data Structures and Algorithms

New submissions

[ total of 5 entries: 1-5 ]
[ showing up to 1000 entries per page: fewer | more ]

New submissions for Mon, 27 Mar 23

[1]  arXiv:2303.13604 [pdf, other]
Title: Stochastic Submodular Bandits with Delayed Composite Anonymous Bandit Feedback
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)

This paper investigates the problem of combinatorial multiarmed bandits with stochastic submodular (in expectation) rewards and full-bandit delayed feedback, where the delayed feedback is assumed to be composite and anonymous. In other words, the delayed feedback is composed of components of rewards from past actions, with unknown division among the sub-components. Three models of delayed feedback: bounded adversarial, stochastic independent, and stochastic conditionally independent are studied, and regret bounds are derived for each of the delay models. Ignoring the problem dependent parameters, we show that regret bound for all the delay models is $\tilde{O}(T^{2/3} + T^{1/3} \nu)$ for time horizon $T$, where $\nu$ is a delay parameter defined differently in the three cases, thus demonstrating an additive term in regret with delay in all the three delay models. The considered algorithm is demonstrated to outperform other full-bandit approaches with delayed composite anonymous feedback.

[2]  arXiv:2303.14084 [pdf, other]
Title: Differentially Private Synthetic Control
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)

Synthetic control is a causal inference tool used to estimate the treatment effects of an intervention by creating synthetic counterfactual data. This approach combines measurements from other similar observations (i.e., donor pool ) to predict a counterfactual time series of interest (i.e., target unit) by analyzing the relationship between the target and the donor pool before the intervention. As synthetic control tools are increasingly applied to sensitive or proprietary data, formal privacy protections are often required. In this work, we provide the first algorithms for differentially private synthetic control with explicit error bounds. Our approach builds upon tools from non-private synthetic control and differentially private empirical risk minimization. We provide upper and lower bounds on the sensitivity of the synthetic control query and provide explicit error bounds on the accuracy of our private synthetic control algorithms. We show that our algorithms produce accurate predictions for the target unit, and that the cost of privacy is small. Finally, we empirically evaluate the performance of our algorithm, and show favorable performance in a variety of parameter regimes, as well as providing guidance to practitioners for hyperparameter tuning.

[3]  arXiv:2303.14102 [pdf, other]
Title: Distributed Silhouette Algorithm: Evaluating Clustering on Big Data
Authors: Marco Gaido
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)

In the big data era, the key feature that each algorithm needs to have is the possibility of efficiently running in parallel in a distributed environment. The popular Silhouette metric to evaluate the quality of a clustering, unfortunately, does not have this property and has a quadratic computational complexity with respect to the size of the input dataset. For this reason, its execution has been hindered in big data scenarios, where clustering had to be evaluated otherwise. To fill this gap, in this paper we introduce the first algorithm that computes the Silhouette metric with linear complexity and can easily execute in parallel in a distributed environment. Its implementation is freely available in the Apache Spark ML library.

Replacements for Mon, 27 Mar 23

[4]  arXiv:2301.00754 (replaced) [pdf, other]
Title: Algorithms for Massive Data -- Lecture Notes
Authors: Nicola Prezza
Comments: refactored structure; refactored variable names in section 2 for consistency with the rest of the notes (m is number of elements and n is universe size); refactored hashing chapter; improved proof of Lem 1.3.11
Subjects: Data Structures and Algorithms (cs.DS)
[5]  arXiv:2108.09805 (replaced) [pdf, other]
Title: Efficient Algorithms for Learning from Coarse Labels
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
[ total of 5 entries: 1-5 ]
[ showing up to 1000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2303, contact, help  (Access key information)