We gratefully acknowledge support from
the Simons Foundation and member institutions.

Data Structures and Algorithms

New submissions

[ total of 10 entries: 1-10 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 4 Mar 21

[1]  arXiv:2103.02512 [pdf, ps, other]
Title: Approximation Algorithms for Socially Fair Clustering
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Machine Learning (stat.ML)

We present an $(e^{O(p)} \frac{\log \ell}{\log\log\ell})$-approximation algorithm for socially fair clustering with the $\ell_p$-objective. In this problem, we are given a set of points in a metric space. Each point belongs to one (or several) of $\ell$ groups. The goal is to find a $k$-medians, $k$-means, or, more generally, $\ell_p$-clustering that is simultaneously good for all of the groups. More precisely, we need to find a set of $k$ centers $C$ so as to minimize the maximum over all groups $j$ of $\sum_{u \text{ in group }j} d(u,C)^p$. The socially fair clustering problem was independently proposed by Abbasi, Bhaskara, and Venkatasubramanian [2021] and Ghadiri, Samadi, and Vempala [2021]. Our algorithm improves and generalizes their $O(\ell)$-approximation algorithms for the problem. The natural LP relaxation for the problem has an integrality gap of $\Omega(\ell)$. In order to obtain our result, we introduce a strengthened LP relaxation and show that it has an integrality gap of $\Theta(\frac{\log \ell}{\log\log\ell})$ for a fixed $p$. Additionally, we present a bicriteria approximation algorithm, which generalizes the bicriteria approximation of Abbasi et al. [2021].

[2]  arXiv:2103.02515 [pdf, other]
Title: Ribbon filter: practically smaller than Bloom and Xor
Comments: 14 pages, 7 figures
Subjects: Data Structures and Algorithms (cs.DS); Databases (cs.DB)

Filter data structures over-approximate a set of hashable keys, i.e. set membership queries may incorrectly come out positive. A filter with false positive rate $f \in (0,1]$ is known to require $\ge \log_2(1/f)$ bits per key. At least for larger $f \ge 2^{-4}$, existing practical filters require a space overhead of at least 20% with respect to this information-theoretic bound.
We introduce the Ribbon filter: a new filter for static sets with a broad range of configurable space overheads and false positive rates with competitive speed over that range, especially for larger $f \ge 2^{-7}$. In many cases, Ribbon is faster than existing filters for the same space overhead, or can achieve space overhead below 10% with some additional CPU time. An experimental Ribbon design with load balancing can even achieve space overheads below 1%.
A Ribbon filter resembles an Xor filter modified to maximize locality and is constructed by solving a band-like linear system over Boolean variables. In previous work, Dietzfelbinger and Walzer describe this linear system and an efficient Gaussian solver. We present and analyze a faster, more adaptable solving process we call "Rapid Incremental Boolean Banding ON the fly," which resembles hash table construction. We also present and analyze an attractive Ribbon variant based on making the linear system homogeneous, and describe several more practical enhancements.

Cross-lists for Thu, 4 Mar 21

[3]  arXiv:2103.01628 (cross-list from cs.GT) [pdf, other]
Title: Improving EFX Guarantees through Rainbow Cycle Number
Subjects: Computer Science and Game Theory (cs.GT); Data Structures and Algorithms (cs.DS)

We study the problem of fairly allocating a set of indivisible goods among $n$ agents with additive valuations. Envy-freeness up to any good (EFX) is arguably the most compelling fairness notion in this context. However, the existence of EFX allocations has not been settled and is one of the most important problems in fair division. Towards resolving this problem, many impressive results show the existence of its relaxations, e.g., the existence of $0.618$-EFX allocations, and the existence of EFX at most $n-1$ unallocated goods. The latter result was recently improved for three agents, in which the two unallocated goods are allocated through an involved procedure. Reducing the number of unallocated goods for arbitrary number of agents is a systematic way to settle the big question. In this paper, we develop a new approach, and show that for every $\varepsilon \in (0,1/2]$, there always exists a $(1-\varepsilon)$-EFX allocation with sublinear number of unallocated goods and high Nash welfare.
For this, we reduce the EFX problem to a novel problem in extremal graph theory. We introduce the notion of rainbow cycle number $R(\cdot)$. For all $d \in \mathbb{N}$, $R(d)$ is the largest $k$ such that there exists a $k$-partite digraph $G =(\cup_{i \in [k]} V_i, E)$, in which
1) each part has at most $d$ vertices, i.e., $\lvert V_i \rvert \leq d$ for all $i \in [k]$,
2) for any two parts $V_i$ and $V_j$, each vertex in $V_i$ has an incoming edge from some vertex in $V_j$ and vice-versa, and
3) there exists no cycle in $G$ that contains at most one vertex from each part.
We show that any upper bound on $R(d)$ directly translates to a sublinear bound on the number of unallocated goods. We establish a polynomial upper bound on $R(d)$, yielding our main result. Furthermore, our approach is constructive, which also gives a polynomial-time algorithm for finding such an allocation.

[4]  arXiv:2103.02013 (cross-list from cs.LG) [pdf, ps, other]
Title: Fairness, Semi-Supervised Learning, and More: A General Framework for Clustering with Stochastic Pairwise Constraints
Comments: This paper appeared in AAAI 2021
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)

Metric clustering is fundamental in areas ranging from Combinatorial Optimization and Data Mining, to Machine Learning and Operations Research. However, in a variety of situations we may have additional requirements or knowledge, distinct from the underlying metric, regarding which pairs of points should be clustered together. To capture and analyze such scenarios, we introduce a novel family of \emph{stochastic pairwise constraints}, which we incorporate into several essential clustering objectives (radius/median/means). Moreover, we demonstrate that these constraints can succinctly model an intriguing collection of applications, including among others \emph{Individual Fairness} in clustering and \emph{Must-link} constraints in semi-supervised learning. Our main result consists of a general framework that yields approximation algorithms with provable guarantees for important clustering objectives, while at the same time producing solutions that respect the stochastic pairwise constraints. Furthermore, for certain objectives we devise improved results in the case of Must-link constraints, which are also the best possible from a theoretical perspective. Finally, we present experimental evidence that validates the effectiveness of our algorithms.

[5]  arXiv:2103.02014 (cross-list from cs.LG) [pdf, other]
Title: Online Adversarial Attacks
Comments: Preprint
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS)

Adversarial attacks expose important vulnerabilities of deep learning models, yet little attention has been paid to settings where data arrives as a stream. In this paper, we formalize the online adversarial attack problem, emphasizing two key elements found in real-world use-cases: attackers must operate under partial knowledge of the target model, and the decisions made by the attacker are irrevocable since they operate on a transient data stream. We first rigorously analyze a deterministic variant of the online threat model by drawing parallels to the well-studied $k$-\textit{secretary problem} and propose \algoname, a simple yet practical algorithm yielding a provably better competitive ratio for $k=2$ over the current best single threshold algorithm. We also introduce the \textit{stochastic $k$-secretary} -- effectively reducing online blackbox attacks to a $k$-secretary problem under noise -- and prove theoretical bounds on the competitive ratios of \textit{any} online algorithms adapted to this setting. Finally, we complement our theoretical results by conducting a systematic suite of experiments on MNIST and CIFAR-10 with both vanilla and robust classifiers, revealing that, by leveraging online secretary algorithms, like \algoname, we can get an online attack success rate close to the one achieved by the optimal offline solution.

[6]  arXiv:2103.02253 (cross-list from cs.GT) [pdf, other]
Title: Optimal Kidney Exchange with Immunosuppressants
Comments: AAAI 2021
Subjects: Computer Science and Game Theory (cs.GT); Data Structures and Algorithms (cs.DS)

Algorithms for exchange of kidneys is one of the key successful applications in market design, artificial intelligence, and operations research. Potent immunosuppressant drugs suppress the body's ability to reject a transplanted organ up to the point that a transplant across blood- or tissue-type incompatibility becomes possible. In contrast to the standard kidney exchange problem, we consider a setting that also involves the decision about which recipients receive from the limited supply of immunosuppressants that make them compatible with originally incompatible kidneys. We firstly present a general computational framework to model this problem. Our main contribution is a range of efficient algorithms that provide flexibility in terms of meeting meaningful objectives. Motivated by the current reality of kidney exchanges using sophisticated mathematical-programming-based clearing algorithms, we then present a general but scalable approach to optimal clearing with immunosuppression; we validate our approach on realistic data from a large fielded exchange.

Replacements for Thu, 4 Mar 21

[7]  arXiv:2006.15747 (replaced) [pdf, ps, other]
Title: Random Assignment Under Bi-Valued Utilities: Analyzing Hylland-Zeckhauser, Nash-Bargaining, and other Rules
Subjects: Computer Science and Game Theory (cs.GT); Data Structures and Algorithms (cs.DS)
[8]  arXiv:2008.00325 (replaced) [pdf, other]
Title: Bringing UMAP Closer to the Speed of Light with GPU Acceleration
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
[9]  arXiv:2008.11964 (replaced) [pdf, ps, other]
Title: On the High Accuracy Limitation of Adaptive Property Estimation
Authors: Yanjun Han
Journal-ref: Published in AISTATS 2021
Subjects: Statistics Theory (math.ST); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT)
[10]  arXiv:2102.04546 (replaced) [pdf, ps, other]
Title: Superfast Coloring in CONGEST via Efficient Color Sampling
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
[ total of 10 entries: 1-10 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2103, contact, help  (Access key information)