We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: Optimal Sampling Gaps for Adaptive Submodular Maximization

Abstract: Running machine learning algorithms on large and rapidly growing volumes of data is often computationally expensive, one common trick to reduce the size of a data set, and thus reduce the computational cost of machine learning algorithms, is \emph{probability sampling}. It creates a sampled data set by including each data point from the original data set with a known probability. Although the benefit of running machine learning algorithms on the reduced data set is obvious, one major concern is that the performance of the solution obtained from samples might be much worse than that of the optimal solution when using the full data set. In this paper, we examine the performance loss caused by probability sampling in the context of adaptive submodular maximization. We consider a simple probability sampling method which selects each data point with probability $r\in[0,1]$. If we set the sampling rate $r=1$, our problem reduces to finding a solution based on the original full data set. We define sampling gap as the largest ratio between the optimal solution obtained from the full data set and the optimal solution obtained from the samples, over independence systems. %It captures the performance loss of the optimal solution caused by the probability sampling. Our main contribution is to show that if the utility function is policywise submodular, then for a given sampling rate $r$, the sampling gap is both upper bounded and lower bounded by $1/r$. One immediate implication of our result is that if we can find an $\alpha$-approximation solution based on a sampled data set (which is sampled at sampling rate $r$), then this solution achieves an $\alpha r$ approximation ratio against the optimal solution when using the full data set.
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as: arXiv:2104.01750 [cs.LG]
  (or arXiv:2104.01750v4 [cs.LG] for this version)

Submission history

From: Shaojie Tang [view email]
[v1] Mon, 5 Apr 2021 03:21:32 GMT (73kb,D)
[v2] Tue, 13 Apr 2021 02:34:04 GMT (75kb,D)
[v3] Wed, 22 Dec 2021 15:45:43 GMT (909kb,D)
[v4] Sun, 2 Jan 2022 20:52:56 GMT (910kb,D)

Link back to: arXiv, form interface, contact.