We gratefully acknowledge support from
the Simons Foundation and member institutions.

Data Structures and Algorithms

New submissions

[ total of 16 entries: 1-16 ]
[ showing up to 500 entries per page: fewer | more ]

New submissions for Fri, 19 Apr 24

[1]  arXiv:2404.11673 [pdf, other]
Title: Hairpin Completion Distance Lower Bound
Comments: To be published in CPM 2024
Subjects: Data Structures and Algorithms (cs.DS)

Hairpin completion, derived from the hairpin formation observed in DNA biochemistry, is an operation applied to strings, particularly useful in DNA computing. Conceptually, a right hairpin completion operation transforms a string $S$ into $S\cdot S'$ where $S'$ is the reverse complement of a prefix of $S$. Similarly, a left hairpin completion operation transforms a string $S$ into $S'\cdot S$ where $S'$ is the reverse complement of a suffix of $S$. The hairpin completion distance from $S$ to $T$ is the minimum number of hairpin completion operations needed to transform $S$ into $T$. Recently Boneh et al. showed an $O(n^2)$ time algorithm for finding the hairpin completion distance between two strings of length at most $n$. In this paper we show that for any $\varepsilon>0$ there is no $O(n^{2-\varepsilon})$-time algorithm for the hairpin completion distance problem unless the Strong Exponential Time Hypothesis (SETH) is false. Thus, under SETH, the time complexity of the hairpin completion distance problem is quadratic, up to sub-polynomial factors.

[2]  arXiv:2404.11879 [pdf, other]
Title: Public Event Scheduling with Busy Agents
Comments: To appear in IJCAI 2024
Subjects: Data Structures and Algorithms (cs.DS)

We study a public event scheduling problem, where multiple public events are scheduled to coordinate the availability of multiple agents. The availability of each agent is determined by solving a separate flexible interval job scheduling problem, where the jobs are required to be preemptively processed. The agents want to attend as many events as possible, and their agreements are considered to be the total length of time during which they can attend these events. The goal is to find a schedule for events as well as the job schedule for each agent such that the total agreement is maximized.
We first show that the problem is NP-hard, and then prove that a simple greedy algorithm achieves $\frac{1}{2}$-approximation when the whole timeline is polynomially bounded. Our method also implies a $(1-\frac{1}{e})$-approximate algorithm for this case. Subsequently, for the general timeline case, we present an algorithmic framework that extends a $\frac{1}{\alpha}$-approximate algorithm for the one-event instance to the general case that achieves $\frac{1}{\alpha+1}$-approximation. Finally, we give a polynomial time algorithm that solves the one-event instance, and this implies a $\frac{1}{2}$-approximate algorithm for the general case.

[3]  arXiv:2404.12080 [pdf, other]
Title: A Mathematical Formalisation of the γ-contraction Problem
Authors: Elia Onofri
Subjects: Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)

Networks play an ubiquitous role in computer science and real-world applications, offering multiple kind of information that can be retrieved with adequate methods. With the continuous growing in the amount of data available, networks are becoming larger day by day. Consequently, the tasks that were easily achievable on smaller networks, often becomes impractical on huge amount of data, either due to the high computational cost or due to the impracticality to visualise corresponding data. Using distinctive node features to group large amount of connected data into a limited number of clusters, hence represented by a representative per cluster, proves to be a valuable approach. The resulting contracted graphs are more manageable in size and can reveal previously hidden characteristics of the original networks. Furthermore, in many real-world use cases, a definition of cluster is intrinsic with the data, eventually obtained with the injection of some expert knowledge represent by a categorical function. Clusters then results in set of connected vertices taking the same values in a finite set C. In the recent literature, Lombardi and Onofri proposed a novel, fast, and easily parallelisable approach under the name of $\gamma$-contraction to contract a graph given a categorical function. In this work, we formally define such approach by providing a rigorous mathematical definition of the problem, which, to the best of our knowledge, was missing in the existing literature. Specifically, we explore the variadic nature of the contraction operation and use it to introduce the weaker version of the colour contraction, under the name of $\beta$-contraction, that the algorithmic solution exploits. We finally dive into the details of the algorithm and we provide a full assesment on its convergence complexity relying on two constructive proofs that deeply unveil its mode of operation.

Cross-lists for Fri, 19 Apr 24

[4]  arXiv:2404.11862 (cross-list from cs.SI) [pdf, ps, other]
Title: A Fast Maximum Clique Algorithm Based on Network Decomposition for Large Sparse Networks
Comments: 12 pages, 2 figures, 1 table
Subjects: Social and Information Networks (cs.SI); Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR); Data Analysis, Statistics and Probability (physics.data-an)

Finding maximum cliques in large networks is a challenging combinatorial problem with many real-world applications. We present a fast algorithm to achieve the exact solution for the maximum clique problem in large sparse networks based on efficient graph decomposition. A bunch of effective techniques is being used to greatly prune the graph and a novel concept called Complete-Upper-Bound-Induced Subgraph (CUBIS) is proposed to ensure that the structures with the potential to form the maximum clique are retained in the process of graph decomposition. Our algorithm first pre-prunes peripheral nodes, subsequently, one or two small-scale CUBISs are constructed guided by the core number and current maximum clique size. Bron-Kerbosch search is performed on each CUBIS to find the maximum clique. Experiments on 50 empirical networks with a scale of up to 20 million show the CUBIS scales are largely independent of the original network scale. This enables an approximately linear runtime, making our algorithm amenable for large networks. Our work provides a new framework for effectively solving maximum clique problems on massive sparse graphs, which not only makes the graph scale no longer the bottleneck but also shows some light on solving other clique-related problems.

[5]  arXiv:2404.12107 (cross-list from cs.DB) [pdf, other]
Title: Effective Individual Fairest Community Search over Heterogeneous Information Networks
Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS); Social and Information Networks (cs.SI)

Community search over heterogeneous information networks has been applied to wide domains, such as activity organization and team formation. From these scenarios, the members of a group with the same treatment often have different levels of activity and workloads, which causes unfairness in the treatment between active members and inactive members (called individual unfairness). However, existing works do not pay attention to individual fairness and do not sufficiently consider the rich semantics of HINs (e.g., high-order structure), which disables complex queries. To fill the gap, we formally define the issue of individual fairest community search over HINs (denoted as IFCS), which aims to find a set of vertices from the HIN that own the same type, close relationships, and small difference of activity level and has been demonstrated to be NP-hard. To do this, we first develop an exploration-based filter that reduces the search space of the community effectively. Further, to avoid repeating computation and prune unfair communities in advance, we propose a message-based scheme and a lower bound-based scheme. At last, we conduct extensive experiments on four real-world datasets to demonstrate the effectiveness and efficiency of our proposed algorithms, which achieve at least X3 times faster than the baseline solution.

Replacements for Fri, 19 Apr 24

[6]  arXiv:1704.04370 (replaced) [pdf, other]
Title: Fast Similarity Sketching
Comments: The original version was directly based on a conference paper of the same title from FOCS'17. This new version is substantially revised with some cleaner and stronger theorems, particularly concerning the high probability domain. Moreover, there is one more author, Jakob Houen. In addition, one of the old authors, Mathias, has changed surname from Knudsen to Langhede
Subjects: Data Structures and Algorithms (cs.DS)
[7]  arXiv:2001.05053 (replaced) [src]
Title: Tight Static Lower Bounds for Non-Adaptive Data Structures
Comments: This paper has been superceded and merged with arXiv:2308.16042
Subjects: Data Structures and Algorithms (cs.DS)
[8]  arXiv:2303.06090 (replaced) [pdf, ps, other]
Title: Simple and efficient four-cycle counting on sparse graphs
Subjects: Data Structures and Algorithms (cs.DS)
[9]  arXiv:2303.14467 (replaced) [pdf, other]
Title: A Survey on the Densest Subgraph Problem and Its Variants
Comments: Accepted to ACM Computing Surveys
Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI)
[10]  arXiv:2308.07476 (replaced) [pdf, ps, other]
Title: Dependent rounding with strong negative-correlation, and scheduling on unrelated machines to minimize completion time
Authors: David G. Harris
Journal-ref: SODA 2024
Subjects: Data Structures and Algorithms (cs.DS)
[11]  arXiv:2310.02670 (replaced) [pdf, other]
Title: Searching 2D-Strings for Matching Frames
Subjects: Data Structures and Algorithms (cs.DS)
[12]  arXiv:2311.13590 (replaced) [pdf, other]
Title: Triangle-free $2$-matchings
Authors: Katarzyna Paluch
Comments: A more polished version and a clearer explanation of \Delta-diminishing hinges and 1-vulnerable triangles
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)
[13]  arXiv:2403.07760 (replaced) [pdf, other]
Title: Simplified Tight Bounds for Monotone Minimal Perfect Hashing
Authors: Dmitry Kosolobov
Comments: 13 pages, 4 figures
Subjects: Data Structures and Algorithms (cs.DS)
[14]  arXiv:2403.12213 (replaced) [pdf, ps, other]
Title: Private graphon estimation via sum-of-squares
Comments: 71 pages, accepted to STOC 2024
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Machine Learning (cs.LG); Machine Learning (stat.ML)
[15]  arXiv:2404.11607 (replaced) [pdf, other]
Title: Private federated discovery of out-of-vocabulary words for Gboard
Subjects: Data Structures and Algorithms (cs.DS)
[16]  arXiv:2207.08015 (replaced) [pdf, ps, other]
Title: Parallel Best Arm Identification in Heterogeneous Environments
Comments: 15 pages (published in SPAA 2024)
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
[ total of 16 entries: 1-16 ]
[ showing up to 500 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2404, contact, help  (Access key information)