We gratefully acknowledge support from
the Simons Foundation and member institutions.

Data Structures and Algorithms

New submissions

[ total of 9 entries: 1-9 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Wed, 14 Apr 21

[1]  arXiv:2104.05771 [pdf, ps, other]
Title: Online Weighted Bipartite Matching with a Sample
Comments: 20 pages, 1 figure
Subjects: Data Structures and Algorithms (cs.DS)

We study the classical online bipartite matching problem: One side of the graph is known and vertices of the other side arrive online. It is well known that when the graph is edge-weighted, and vertices arrive in an adversarial order, no online algorithm has a nontrivial competitive-ratio. To bypass this hurdle we modify the rules such that the adversary still picks the graph but has to reveal a random part (say half) of it to the player. The remaining part is given to the player in an adversarial order. This models practical scenarios in which the online algorithm has some history to learn from.
This way of modeling a history was formalized recently by the authors (SODA 20) and was called the AOS model (for Adversarial Online with a Sample). It allows developing online algorithms for the secretary problem that compete even when the secretaries arrive in an adversarial order. Here we use the same model to attack the much more challenging matching problem.
We analyze a natural algorithmic framework that decides how to match an arriving vertex $v$ by applying an offline matching algorithm to $v$ and the sample. We get roughly $1/4$ of the maximum weight by applying the offline greedy matching algorithm to the sample and $v$. Our analysis ties the performance of this algorithm to the performance of the offline greedy matching on the online part and we also prove that it is tight. Surprisingly, when replacing greedy with an optimal algorithm for maximum matching, no constant competitive-ratio can be guaranteed when the size of the sample is comparable to the size of the online part. However, when the sample is quadratic in the size of the online part, we do get a competitive-ratio of $1/e$.

[2]  arXiv:2104.06133 [pdf, other]
Title: A New Coreset Framework for Clustering
Subjects: Data Structures and Algorithms (cs.DS)

Given a metric space, the $(k,z)$-clustering problem consists of finding $k$ centers such that the sum of the of distances raised to the power $z$ of every point to its closest center is minimized. This encapsulates the famous $k$-median ($z=1$) and $k$-means ($z=2$) clustering problems. Designing small-space sketches of the data that approximately preserves the cost of the solutions, also known as \emph{coresets}, has been an important research direction over the last 15 years.
In this paper, we present a new, simple coreset framework that simultaneously improves upon the best known bounds for a large variety of settings, ranging from Euclidean space, doubling metric, minor-free metric, and the general metric cases.

[3]  arXiv:2104.06210 [pdf, ps, other]
Title: A simple proof of the Moore-Hodgson Algorithm for minimizing the number of late jobs
Comments: 3 pages
Subjects: Data Structures and Algorithms (cs.DS); Performance (cs.PF); Combinatorics (math.CO)

The Moore-Hodgson Algorithm minimizes the number of late jobs on a single machine. That is, it finds an optimal schedule for the classical problem $1~|\;|~\sum{U_j}$. Several proofs of the correctness of this algorithm have been published. We present a new short proof.

Cross-lists for Wed, 14 Apr 21

[4]  arXiv:2011.11907 (cross-list from cs.DB) [pdf, other]
Title: Efficient Approximate Nearest Neighbor Search for Multiple Weighted $l_{p\leq2}$ Distance Functions
Authors: Huan Hu, Jianzhong Li
Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS)

Nearest neighbor search is fundamental to a wide range of applications. Since the exact nearest neighbor search suffers from the "curse of dimensionality", approximate approaches, such as Locality-Sensitive Hashing (LSH), are widely used to trade a little query accuracy for a much higher query efficiency. In many scenarios, it is necessary to perform nearest neighbor search under multiple weighted distance functions in high-dimensional spaces. This paper considers the important problem of supporting efficient approximate nearest neighbor search for multiple weighted distance functions in high-dimensional spaces. To the best of our knowledge, prior work can only solve the problem for the $l_2$ distance. However, numerous studies have shown that the $l_p$ distance with $p\in(0,2)$ could be more effective than the $l_2$ distance in high-dimensional spaces. We propose a novel method, WLSH, to address the problem for the $l_p$ distance for $p\in(0,2]$. WLSH takes the LSH approach and can theoretically guarantee both the efficiency of processing queries and the accuracy of query results while minimizing the required total number of hash tables. We conduct extensive experiments on synthetic and real data sets, and the results show that WLSH achieves high performance in terms of query efficiency, query accuracy and space consumption.

[5]  arXiv:2104.05983 (cross-list from cs.CR) [pdf, ps, other]
Title: Towards Better Understanding of User Authorization Query Problem via Multi-variable Complexity Analysis
Comments: Accepted for publication in ACM Transactions on Privacy and Security (TOPS)
Subjects: Cryptography and Security (cs.CR); Databases (cs.DB); Data Structures and Algorithms (cs.DS)

User authorization queries in the context of role-based access control have attracted considerable interest in the last 15 years. Such queries are used to determine whether it is possible to allocate a set of roles to a user that enables the user to complete a task, in the sense that all the permissions required to complete the task are assigned to the roles in that set. Answering such a query, in general, must take into account a number of factors, including, but not limited to, the roles to which the user is assigned and constraints on the sets of roles that can be activated. Answering such a query is known to be NP-hard. The presence of multiple parameters and the need to find efficient and exact solutions to the problem suggest that a multi-variate approach will enable us to better understand the complexity of the user authorization query problem (UAQ). In this paper, we establish a number of complexity results for UAQ. Specifically, we show the problem remains hard even when quite restrictive conditions are imposed on the structure of the problem. Our FPT results show that we have to use either a parameter with potentially quite large values or quite a restricted version of UAQ. Moreover, our second FPT algorithm is complex and requires sophisticated, state-of-the-art techniques. In short, our results show that it is unlikely that all variants of UAQ that arise in practice can be solved reasonably quickly in general.

Replacements for Wed, 14 Apr 21

[6]  arXiv:1805.06869 (replaced) [pdf, ps, other]
Title: Revisiting the tree edit distance and its backtracing: A tutorial
Authors: Benjamin Paaßen
Comments: Supplementary material for the ICML 2018 paper: Tree Edit Distance Learning via Adaptive Symbol Embeddings
Subjects: Data Structures and Algorithms (cs.DS)
[7]  arXiv:2007.08204 (replaced) [pdf, other]
Title: A Faster Exponential Time Algorithm for Bin Packing With a Constant Number of Bins via Additive Combinatorics
Comments: Presented at SODA 2021; 42 pages; 4 figures
Subjects: Data Structures and Algorithms (cs.DS)
[8]  arXiv:1911.04415 (replaced) [pdf, other]
Title: Revisiting the Approximate Carathéodory Problem via the Frank-Wolfe Algorithm
Comments: 21 pages and 2 figures
Subjects: Optimization and Control (math.OC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
[9]  arXiv:2007.15306 (replaced) [pdf, other]
Title: Phase Transition of the k-Majority Dynamics in Biased Communication Models
Comments: Preliminary versions published in DISC 2020 (Brief Announcement) and ICDCN 2021
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Probability (math.PR)
[ total of 9 entries: 1-9 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2104, contact, help  (Access key information)