Data Structures and Algorithms
New submissions
[ showing up to 2000 entries per page: fewer | more ]
New submissions for Thu, 4 Mar 21
- [1] arXiv:2103.02512 [pdf, ps, other]
-
Title: Approximation Algorithms for Socially Fair ClusteringSubjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Machine Learning (stat.ML)
We present an $(e^{O(p)} \frac{\log \ell}{\log\log\ell})$-approximation algorithm for socially fair clustering with the $\ell_p$-objective. In this problem, we are given a set of points in a metric space. Each point belongs to one (or several) of $\ell$ groups. The goal is to find a $k$-medians, $k$-means, or, more generally, $\ell_p$-clustering that is simultaneously good for all of the groups. More precisely, we need to find a set of $k$ centers $C$ so as to minimize the maximum over all groups $j$ of $\sum_{u \text{ in group }j} d(u,C)^p$. The socially fair clustering problem was independently proposed by Abbasi, Bhaskara, and Venkatasubramanian [2021] and Ghadiri, Samadi, and Vempala [2021]. Our algorithm improves and generalizes their $O(\ell)$-approximation algorithms for the problem. The natural LP relaxation for the problem has an integrality gap of $\Omega(\ell)$. In order to obtain our result, we introduce a strengthened LP relaxation and show that it has an integrality gap of $\Theta(\frac{\log \ell}{\log\log\ell})$ for a fixed $p$. Additionally, we present a bicriteria approximation algorithm, which generalizes the bicriteria approximation of Abbasi et al. [2021].
- [2] arXiv:2103.02515 [pdf, other]
-
Title: Ribbon filter: practically smaller than Bloom and XorComments: 14 pages, 7 figuresSubjects: Data Structures and Algorithms (cs.DS); Databases (cs.DB)
Filter data structures over-approximate a set of hashable keys, i.e. set membership queries may incorrectly come out positive. A filter with false positive rate $f \in (0,1]$ is known to require $\ge \log_2(1/f)$ bits per key. At least for larger $f \ge 2^{-4}$, existing practical filters require a space overhead of at least 20% with respect to this information-theoretic bound.
We introduce the Ribbon filter: a new filter for static sets with a broad range of configurable space overheads and false positive rates with competitive speed over that range, especially for larger $f \ge 2^{-7}$. In many cases, Ribbon is faster than existing filters for the same space overhead, or can achieve space overhead below 10% with some additional CPU time. An experimental Ribbon design with load balancing can even achieve space overheads below 1%.
A Ribbon filter resembles an Xor filter modified to maximize locality and is constructed by solving a band-like linear system over Boolean variables. In previous work, Dietzfelbinger and Walzer describe this linear system and an efficient Gaussian solver. We present and analyze a faster, more adaptable solving process we call "Rapid Incremental Boolean Banding ON the fly," which resembles hash table construction. We also present and analyze an attractive Ribbon variant based on making the linear system homogeneous, and describe several more practical enhancements.
Cross-lists for Thu, 4 Mar 21
- [3] arXiv:2103.01628 (cross-list from cs.GT) [pdf, other]
-
Title: Improving EFX Guarantees through Rainbow Cycle NumberSubjects: Computer Science and Game Theory (cs.GT); Data Structures and Algorithms (cs.DS)
We study the problem of fairly allocating a set of indivisible goods among $n$ agents with additive valuations. Envy-freeness up to any good (EFX) is arguably the most compelling fairness notion in this context. However, the existence of EFX allocations has not been settled and is one of the most important problems in fair division. Towards resolving this problem, many impressive results show the existence of its relaxations, e.g., the existence of $0.618$-EFX allocations, and the existence of EFX at most $n-1$ unallocated goods. The latter result was recently improved for three agents, in which the two unallocated goods are allocated through an involved procedure. Reducing the number of unallocated goods for arbitrary number of agents is a systematic way to settle the big question. In this paper, we develop a new approach, and show that for every $\varepsilon \in (0,1/2]$, there always exists a $(1-\varepsilon)$-EFX allocation with sublinear number of unallocated goods and high Nash welfare.
For this, we reduce the EFX problem to a novel problem in extremal graph theory. We introduce the notion of rainbow cycle number $R(\cdot)$. For all $d \in \mathbb{N}$, $R(d)$ is the largest $k$ such that there exists a $k$-partite digraph $G =(\cup_{i \in [k]} V_i, E)$, in which
1) each part has at most $d$ vertices, i.e., $\lvert V_i \rvert \leq d$ for all $i \in [k]$,
2) for any two parts $V_i$ and $V_j$, each vertex in $V_i$ has an incoming edge from some vertex in $V_j$ and vice-versa, and
3) there exists no cycle in $G$ that contains at most one vertex from each part.
We show that any upper bound on $R(d)$ directly translates to a sublinear bound on the number of unallocated goods. We establish a polynomial upper bound on $R(d)$, yielding our main result. Furthermore, our approach is constructive, which also gives a polynomial-time algorithm for finding such an allocation. - [4] arXiv:2103.02013 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Fairness, Semi-Supervised Learning, and More: A General Framework for Clustering with Stochastic Pairwise ConstraintsAuthors: Brian Brubach, Darshan Chakrabarti, John P. Dickerson, Aravind Srinivasan, Leonidas TsepenekasComments: This paper appeared in AAAI 2021Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
Metric clustering is fundamental in areas ranging from Combinatorial Optimization and Data Mining, to Machine Learning and Operations Research. However, in a variety of situations we may have additional requirements or knowledge, distinct from the underlying metric, regarding which pairs of points should be clustered together. To capture and analyze such scenarios, we introduce a novel family of \emph{stochastic pairwise constraints}, which we incorporate into several essential clustering objectives (radius/median/means). Moreover, we demonstrate that these constraints can succinctly model an intriguing collection of applications, including among others \emph{Individual Fairness} in clustering and \emph{Must-link} constraints in semi-supervised learning. Our main result consists of a general framework that yields approximation algorithms with provable guarantees for important clustering objectives, while at the same time producing solutions that respect the stochastic pairwise constraints. Furthermore, for certain objectives we devise improved results in the case of Must-link constraints, which are also the best possible from a theoretical perspective. Finally, we present experimental evidence that validates the effectiveness of our algorithms.
- [5] arXiv:2103.02014 (cross-list from cs.LG) [pdf, other]
-
Title: Online Adversarial AttacksAuthors: Andjela Mladenovic, Avishek Joey Bose, Hugo Berard, William L. Hamilton, Simon Lacoste-Julien, Pascal Vincent, Gauthier GidelComments: PreprintSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS)
Adversarial attacks expose important vulnerabilities of deep learning models, yet little attention has been paid to settings where data arrives as a stream. In this paper, we formalize the online adversarial attack problem, emphasizing two key elements found in real-world use-cases: attackers must operate under partial knowledge of the target model, and the decisions made by the attacker are irrevocable since they operate on a transient data stream. We first rigorously analyze a deterministic variant of the online threat model by drawing parallels to the well-studied $k$-\textit{secretary problem} and propose \algoname, a simple yet practical algorithm yielding a provably better competitive ratio for $k=2$ over the current best single threshold algorithm. We also introduce the \textit{stochastic $k$-secretary} -- effectively reducing online blackbox attacks to a $k$-secretary problem under noise -- and prove theoretical bounds on the competitive ratios of \textit{any} online algorithms adapted to this setting. Finally, we complement our theoretical results by conducting a systematic suite of experiments on MNIST and CIFAR-10 with both vanilla and robust classifiers, revealing that, by leveraging online secretary algorithms, like \algoname, we can get an online attack success rate close to the one achieved by the optimal offline solution.
- [6] arXiv:2103.02253 (cross-list from cs.GT) [pdf, other]
-
Title: Optimal Kidney Exchange with ImmunosuppressantsComments: AAAI 2021Subjects: Computer Science and Game Theory (cs.GT); Data Structures and Algorithms (cs.DS)
Algorithms for exchange of kidneys is one of the key successful applications in market design, artificial intelligence, and operations research. Potent immunosuppressant drugs suppress the body's ability to reject a transplanted organ up to the point that a transplant across blood- or tissue-type incompatibility becomes possible. In contrast to the standard kidney exchange problem, we consider a setting that also involves the decision about which recipients receive from the limited supply of immunosuppressants that make them compatible with originally incompatible kidneys. We firstly present a general computational framework to model this problem. Our main contribution is a range of efficient algorithms that provide flexibility in terms of meeting meaningful objectives. Motivated by the current reality of kidney exchanges using sophisticated mathematical-programming-based clearing algorithms, we then present a general but scalable approach to optimal clearing with immunosuppression; we validate our approach on realistic data from a large fielded exchange.
Replacements for Thu, 4 Mar 21
- [7] arXiv:2006.15747 (replaced) [pdf, ps, other]
-
Title: Random Assignment Under Bi-Valued Utilities: Analyzing Hylland-Zeckhauser, Nash-Bargaining, and other RulesSubjects: Computer Science and Game Theory (cs.GT); Data Structures and Algorithms (cs.DS)
- [8] arXiv:2008.00325 (replaced) [pdf, other]
-
Title: Bringing UMAP Closer to the Speed of Light with GPU AccelerationAuthors: Corey J. Nolet, Victor Lafargue, Edward Raff, Thejaswi Nanditale, Tim Oates, John Zedlewski, Joshua PattersonSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
- [9] arXiv:2008.11964 (replaced) [pdf, ps, other]
-
Title: On the High Accuracy Limitation of Adaptive Property EstimationAuthors: Yanjun HanJournal-ref: Published in AISTATS 2021Subjects: Statistics Theory (math.ST); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT)
- [10] arXiv:2102.04546 (replaced) [pdf, ps, other]
-
Title: Superfast Coloring in CONGEST via Efficient Color SamplingSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
[ showing up to 2000 entries per page: fewer | more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, cs, recent, 2103, contact, help (Access key information)