# Data Structures and Algorithms

## New submissions

[ total of 25 entries: 1-25 ]
[ showing up to 2000 entries per page: fewer | more ]

### New submissions for Tue, 19 Oct 21


Title: Terminal Embeddings in Sublinear Time
Subjects: Data Structures and Algorithms (cs.DS); Computational Geometry (cs.CG); Machine Learning (cs.LG); Machine Learning (stat.ML)

Recently (Elkin, Filtser, Neiman 2017) introduced the concept of a {\it terminal embedding} from one metric space $(X,d_X)$ to another $(Y,d_Y)$ with a set of designated terminals $T\subset X$. Such an embedding $f$ is said to have distortion $\rho\ge 1$ if $\rho$ is the smallest value such that there exists a constant $C>0$ satisfying
\begin{equation*}
\forall x\in T\ \forall q\in X,\ C d_X(x, q) \le d_Y(f(x), f(q)) \le C \rho d_X(x, q) .
\end{equation*}
In the case that $X,Y$ are both Euclidean metrics with $Y$ being $m$-dimensional, recently (Narayanan, Nelson 2019), following work of (Mahabadi, Makarychev, Makarychev, Razenshteyn 2018), showed that distortion $1+\epsilon$ is achievable via such a terminal embedding with $m = O(\epsilon^{-2}\log n)$ for $n := |T|$. This generalizes the Johnson-Lindenstrauss lemma, which only preserves distances within $T$ and not to $T$ from the rest of space. The downside is that evaluating the embedding on some $q\in \mathbb{R}^d$ required solving a semidefinite program with $\Theta(n)$ constraints in $m$ variables and thus required some superlinear $\mathrm{poly}(n)$ runtime. Our main contribution in this work is to give a new data structure for computing terminal embeddings. We show how to pre-process $T$ to obtain an almost linear-space data structure that supports computing the terminal embedding image of any $q\in\mathbb{R}^d$ in sublinear time $n^{1-\Theta(\epsilon^2)+o(1)} + dn^{o(1)}$. To accomplish this, we leverage tools developed in the context of approximate nearest neighbor search.


Title: Faster Algorithms for Bounded-Difference Min-Plus Product
Subjects: Data Structures and Algorithms (cs.DS)

Min-plus product of two $n\times n$ matrices is a fundamental problem in algorithm research. It is known to be equivalent to APSP, and in general it has no truly subcubic algorithms. In this paper, we focus on the min-plus product on a special class of matrices, called $\delta$-bounded-difference matrices, in which the difference between any two adjacent entries is bounded by $\delta=O(1)$. Our algorithm runs in randomized time $O(n^{2.779})$ by the fast rectangular matrix multiplication algorithm [Le Gall \& Urrutia 18], better than $\tilde{O}(n^{2+\omega/3})=O(n^{2.791})$ ($\omega<2.373$ [Alman \& V.V.Williams 20]). This improves previous result of $\tilde{O}(n^{2.824})$ [Bringmann et al. 16]. When $\omega=2$ in the ideal case, our complexity is $\tilde{O}(n^{2+2/3})$, improving Bringmann et al.'s result of $\tilde{O}(n^{2.755})$.


Title: Online Facility Location with Predictions
Subjects: Data Structures and Algorithms (cs.DS)

We provide nearly optimal algorithms for online facility location (OFL) with predictions. In OFL, $n$ demand points arrive in order and the algorithm must irrevocably assign each demand point to an open facility upon its arrival. The objective is to minimize the total connection costs from demand points to assigned facilities plus the facility opening cost. We further assume the algorithm is additionally given for each demand point $x_i$ a natural prediction $f_{x_i}^{\mathrm{pred}}$ which is supposed to be the facility $f_{x_i}^{\mathrm{opt}}$ that serves $x_i$ in the offline optimal solution.
Our main result is an $O(\min\{\log {\frac{n\eta_\infty}{\mathrm{OPT}}}, \log{n} \})$-competitive algorithm where $\eta_\infty$ is the maximum prediction error (i.e., the distance between $f_{x_i}^{\mathrm{pred}}$ and $f_{x_i}^{\mathrm{opt}}$). Our algorithm overcomes the fundamental $\Omega(\frac{\log n}{\log \log n})$ lower bound of OFL (without predictions) when $\eta_\infty$ is small, and it still maintains $O(\log n)$ ratio even when $\eta_\infty$ is unbounded. Furthermore, our theoretical analysis is supported by empirical evaluations for the tradeoffs between $\eta_\infty$ and the competitive ratio on various real datasets of different types.


Title: On Monotonicity of Number-Partitioning Algorithms
Comments: I needed this lemma for something else, and could not find it anywhere, so I wrote a short proof. Is it already known?
Subjects: Data Structures and Algorithms (cs.DS)

An algorithm for number-partitioning is called value-monotone if whenever one of the input numbers increases, the objective function (the largest sum or the smallest sum of a subset in the output) weakly increases. This note proves that the List Scheduling algorithm and the Longest Processing Time algorithm are both value-monotone. This is in contrast to another algorithm -- MultiFit -- which is not value-monotone.


Title: Algorithms Using Local Graph Features to Predict Epidemics
Subjects: Data Structures and Algorithms (cs.DS); Combinatorics (math.CO); Probability (math.PR)

We study a simple model of epidemics where an infected node transmits the infection to its neighbors independently with probability $p$. This is also known as the independent cascade or Susceptible-Infected-Recovered (SIR) model with fixed recovery time. The size of an outbreak in this model is closely related to that of the giant connected component in edge percolation'', where each edge of the graph is kept independently with probability $p$, studied for a large class of networks including configuration model \cite{molloy2011critical} and preferential attachment \cite{bollobas2003,Riordan2005}. Even though these models capture the effects of degree inhomogeneity and the role of super-spreaders in the spread of an epidemic, they only consider graphs that are locally tree like i.e. have a few or no short cycles. Some generalizations of the configuration model were suggested to capture local communities, known as household models \cite{ball2009threshold}, or hierarchical configuration model \cite{Hofstad2015hierarchical}.
Here, we ask a different question: what information is needed for general networks to predict the size of an outbreak? Is it possible to make predictions by accessing the distribution of small subgraphs (or motifs)? We answer the question in the affirmative for large-set expanders with local weak limits (also known as Benjamini-Schramm limits). In particular, we show that there is an algorithm which gives a $(1-\epsilon)$ approximation of the probability and the final size of an outbreak by accessing a constant-size neighborhood of a constant number of nodes chosen uniformly at random. We also present corollaries of the theorem for the preferential attachment model, and study generalizations with household (or motif) structure. The latter was only known for the configuration model.


Title: Dimensionality Reduction for Wasserstein Barycenter
Comments: Published as a conference paper in NeurIPS 2021
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Probability (math.PR)

The Wasserstein barycenter is a geometric construct which captures the notion of centrality among probability distributions, and which has found many applications in machine learning. However, most algorithms for finding even an approximate barycenter suffer an exponential dependence on the dimension $d$ of the underlying space of the distributions. In order to cope with this "curse of dimensionality," we study dimensionality reduction techniques for the Wasserstein barycenter problem. When the barycenter is restricted to support of size $n$, we show that randomized dimensionality reduction can be used to map the problem to a space of dimension $O(\log n)$ independent of both $d$ and $k$, and that \emph{any} solution found in the reduced dimension will have its cost preserved up to arbitrary small error in the original space. We provide matching upper and lower bounds on the size of the reduced dimension, showing that our methods are optimal up to constant factors. We also provide a coreset construction for the Wasserstein barycenter problem that significantly decreases the number of input distributions. The coresets can be used in conjunction with random projections and thus further improve computation time. Lastly, our experimental results validate the speedup provided by dimensionality reduction while maintaining solution quality.


Title: Diameter constrained Steiner tree and related problems
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)

We give a dynamic programming solution to find the minimum cost of a diameter constrained Steiner tree in case of directed graphs. Then we show a simple reduction from undirected version to the directed version to realize an algorithm of similar complexity i.e, FPT in number of terminal vertices. Other natural variants of constrained Steiner trees are defined by imposing constraints on the min-degree and size of the Steiner tree and some polynomial time reductions among these problems are proven. To the best of our knowledge, these fairly simple reductions are not present in the literature prior to our work.


Title: Data structure for node connectivity queries
Authors: Zeev Nutov
Subjects: Data Structures and Algorithms (cs.DS)

Let $\kappa(s,t)$ denote the maximum number of internally disjoint paths in an undirected graph $G$. We consider designing a data structure that includes a list of cuts, and answers in $O(1)$ time the following query: given $s,t \in V$, determine whether $\kappa(s,t) \leq k$, and if so, return a pointer to an $st$-cut of size $\leq k$ in the list. A trivial data structure includes a list of $n(n-1)/2$ cuts and requires $\Theta(kn^2)$ space. We show that $O(kn)$ cuts suffice, thus reducing the space to $O(k^2 n+n^2)$. In the case when $G$ is $k$-connected, we show that $O(n)$ cuts suffice, and that these cuts can be partitioned into $O(k)$ laminar families; this reduces the space to $O(kn)$. The latter result slightly improves and substantially simplifies a recent result of Pettie and Yin [ICALP 2021].


Title: Machine Covering in the Random-Order Model
Subjects: Data Structures and Algorithms (cs.DS)

In the Online Machine Covering problem jobs, defined by their sizes, arrive one by one and have to be assigned to $m$ parallel and identical machines, with the goal of maximizing the load of the least-loaded machine. In this work, we study the Machine Covering problem in the recently popular random-order model. Here no extra resources are present, but instead the adversary is weakened in that it can only decide upon the input set while jobs are revealed uniformly at random. It is particularly relevant to Machine Covering where lower bounds are usually associated to highly structured input sequences.
We first analyze Graham's Greedy-strategy in this context and establish that its competitive ratio decreases slightly to $\Theta\left(\frac{m}{\log(m)}\right)$ which is asymptotically tight. Then, as our main result, we present an improved $\tilde{O}(\sqrt{m})$-competitive algorithm for the problem. This result is achieved by exploiting the extra information coming from the random order of the jobs, using sampling techniques to devise an improved mechanism to distinguish jobs that are relatively large from small ones. We complement this result with a first lower bound showing that no algorithm can have a competitive ratio of $O\left(\frac{\log(m)}{\log\log(m)}\right)$ in the random-order model. This lower bound is achieved by studying a novel variant of the Secretary problem, which could be of independent interest.

### Cross-lists for Tue, 19 Oct 21

  arXiv:2110.08483 (cross-list from cs.LG) [pdf, other]
Title: Streaming Decision Trees and Forests
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)

Machine learning has successfully leveraged modern data and provided computational solutions to innumerable real-world problems, including physical and biomedical discoveries. Currently, estimators could handle both scenarios with all samples available and situations requiring continuous updates. However, there is still room for improvement on streaming algorithms based on batch decision trees and random forests, which are the leading methods in batch data tasks. In this paper, we explore the simplest partial fitting algorithm to extend batch trees and test our models: stream decision tree (SDT) and stream decision forest (SDF) on three classification tasks of varying complexities. For reference, both existing streaming trees (Hoeffding trees and Mondrian forests) and batch estimators are included in the experiments. In all three tasks, SDF consistently produces high accuracy, whereas existing estimators encounter space restraints and accuracy fluctuations. Thus, our streaming trees and forests show great potential for further improvements, which are good candidates for solving problems like distribution drift and transfer learning.

  arXiv:2110.08669 (cross-list from cs.CG) [pdf, other]
Title: Constructing Many Faces in Arrangements of Lines and Segments
Authors: Haitao Wang
Comments: To be presented at SODA 2022
Subjects: Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS)

We present new algorithms for computing many faces in arrangements of lines and segments. Given a set $S$ of $n$ lines (resp., segments) and a set $P$ of $m$ points in the plane, the problem is to compute the faces of the arrangements of $S$ that contain at least one point of $P$. For the line case, we give a deterministic algorithm of $O(m^{2/3}n^{2/3}\log^{2/3} (n/\sqrt{m})+(m+n)\log n)$ time. This improves the previously best deterministic algorithm [Agarwal, 1990] by a factor of $\log^{2.22}n$ and improves the previously best randomized algorithm [Agarwal, Matou\v{s}ek, and Schwarzkopf, 1998] by a factor of $\log^{1/3}n$ in certain cases (e.g., when $m=\Theta(n)$). For the segment case, we present a deterministic algorithm of $O(n^{2/3}m^{2/3}\log n+\tau(n\alpha^2(n)+n\log m+m)\log n)$ time, where $\tau=\min\{\log m,\log (n/\sqrt{m})\}$ and $\alpha(n)$ is the inverse Ackermann function. This improves the previously best deterministic algorithm [Agarwal, 1990] by a factor of $\log^{2.11}n$ and improves the previously best randomized algorithm [Agarwal, Matou\v{s}ek, and Schwarzkopf, 1998] by a factor of $\log n$ in certain cases (e.g., when $m=\Theta(n)$). We also give a randomized algorithm of $O(m^{2/3}K^{1/3}\log n+\tau(n\alpha(n)+n\log m+m)\log n\log K)$ expected time, where $K$ is the number of intersections of all segments of $S$. In addition, we consider the query version of the problem, that is, preprocess $S$ to compute the face of the arrangement of $S$ that contains any query point. We present new results that improve the previous work for both the line and the segment cases.

  arXiv:2110.08677 (cross-list from cs.CC) [pdf, ps, other]
Title: Algorithmic Thresholds for Refuting Random Polynomial Systems
Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS)

Consider a system of $m$ polynomial equations $\{p_i(x) = b_i\}_{i \leq m}$ of degree $D\geq 2$ in $n$-dimensional variable $x \in \mathbb{R}^n$ such that each coefficient of every $p_i$ and $b_i$s are chosen at random and independently from some continuous distribution. We study the basic question of determining the smallest $m$ -- the algorithmic threshold -- for which efficient algorithms can find refutations (i.e. certificates of unsatisfiability) for such systems. This setting generalizes problems such as refuting random SAT instances, low-rank matrix sensing and certifying pseudo-randomness of Goldreich's candidate generators and generalizations.
We show that for every $d \in \mathbb{N}$, the $(n+m)^{O(d)}$-time canonical sum-of-squares (SoS) relaxation refutes such a system with high probability whenever $m \geq O(n) \cdot (\frac{n}{d})^{D-1}$. We prove a lower bound in the restricted low-degree polynomial model of computation which suggests that this trade-off between SoS degree and the number of equations is nearly tight for all $d$. We also confirm the predictions of this lower bound in a limited setting by showing a lower bound on the canonical degree-$4$ sum-of-squares relaxation for refuting random quadratic polynomials. Together, our results provide evidence for an algorithmic threshold for the problem at $m \gtrsim \widetilde{O}(n) \cdot n^{(1-\delta)(D-1)}$ for $2^{n^{\delta}}$-time algorithms for all $\delta$.

  arXiv:2110.09068 (cross-list from cs.DM) [pdf, ps, other]
Title: Approximate Sampling and Counting of Graphs with Near-Regular Degree Intervals
Subjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)

The approximate uniform sampling of graphs with a given degree sequence is a well-known, extensively studied problem in theoretical computer science and has significant applications, e.g., in the analysis of social networks. In this work we study an extension of the problem, where degree intervals are specified rather than a single degree sequence. We are interested in sampling and counting graphs whose degree sequences satisfy the degree interval constraints. A natural scenario where this problem arises is in hypothesis testing on social networks that are only partially observed.
In this work, we provide the first fully polynomial almost uniform sampler (FPAUS) as well as the first fully polynomial randomized approximation scheme (FPRAS) for sampling and counting, respectively, graphs with near-regular degree intervals, in which every node $i$ has a degree from an interval not too far away from a given $d \in \N$. In order to design our FPAUS, we rely on various state-of-the-art tools from Markov chain theory and combinatorics. In particular, we provide the first non-trivial algorithmic application of a breakthrough result of Liebenau and Wormald (2017) regarding an asymptotic formula for the number of graphs with a given near-regular degree sequence. Furthermore, we also make use of the recent breakthrough of Anari et al. (2019) on sampling a base of a matroid under a strongly log-concave probability distribution.
As a more direct approach, we also study a natural Markov chain recently introduced by Rechner, Strowick and M\"uller-Hannemann (2018), based on three simple local operations: Switches, hinge flips, and additions/deletions of a single edge. We obtain the first theoretical results for this Markov chain by showing it is rapidly mixing for the case of near-regular degree intervals of size at most one.

  arXiv:2110.09369 (cross-list from cs.CC) [pdf, other]
Title: Anti-Factor is FPT Parameterized by Treewidth and List Size (but Counting is Hard)
Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS)

In the general AntiFactor problem, a graph $G$ is given with a set $X_v\subseteq \mathbb{N}$ of forbidden degrees for every vertex $v$ and the task is to find a set $S$ of edges such that the degree of $v$ in $S$ is not in the set $X_v$. Standard techniques (dynamic programming + fast convolution) can be used to show that if $M$ is the largest forbidden degree, then the problem can be solved in time $(M+2)^k\cdot n^{O(1)}$ if a tree decomposition of width $k$ is given. However, significantly faster algorithms are possible if the sets $X_v$ are sparse: our main algorithmic result shows that if every vertex has at most $x$ forbidden degrees (we call this special case AntiFactor$_x$), then the problem can be solved in time $(x+1)^{O(k)}\cdot n^{O(1)}$. That is, the AntiFactor$_x$ is fixed-parameter tractable parameterized by treewidth $k$ and the maximum number $x$ of excluded degrees.
Our algorithm uses the technique of representative sets, which can be generalized to the optimization version, but (as expected) not to the counting version of the problem. In fact, we show that #AntiFactor$_1$ is already #W-hard parameterized by the width of the given decomposition. Moreover, we show that, unlike for the decision version, the standard dynamic programming algorithm is essentially optimal for the counting version. Formally, for a fixed nonempty set $X$, we denote by $X$-AntiFactor the special case where every vertex $v$ has the same set $X_v=X$ of forbidden degrees. We show the following lower bound for every fixed set $X$: if there is an $\epsilon>0$ such that #$X$-AntiFactor can be solved in time $(\max X+2-\epsilon)^k\cdot n^{O(1)}$ on a tree decomposition of width $k$, then the Counting Strong Exponential-Time Hypothesis (#SETH) fails.

### Replacements for Tue, 19 Oct 21

  arXiv:2002.07336 (replaced) [pdf, other]
Title: The power of adaptivity in source location on the path
Subjects: Data Structures and Algorithms (cs.DS)
  arXiv:2002.07955 (replaced) [pdf, ps, other]
Title: Improved Classical and Quantum Algorithms for the Shortest Vector Problem via Bounded Distance Decoding
Comments: Faster Quantum Algorithm for SVP in QRAM, 43 pages, 4 figures
Subjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR)
  arXiv:2104.13362 (replaced) [pdf, other]
Title: There is no APTAS for 2-dimensional vector bin packing: Revisited
Authors: Arka Ray
Comments: 15 pages, LIPIcs format, changes: fixed typos, added vector bin covering result
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
  arXiv:2108.08987 (replaced) [pdf, other]
Title: Uniformity Testing in the Shuffle Model: Simpler, Better, Faster
Comments: Accepted to the SIAM Symposium on Simplicity in Algorithms (SOSA 2022). Added some details and discussions
Subjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR); Discrete Mathematics (cs.DM); Machine Learning (stat.ML)
  arXiv:2110.04933 (replaced) [pdf, other]
Title: A Faster Algorithm for Maximum Independent Set on Interval Filament Graphs
Authors: Darcy Best, Max Ward
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)
  arXiv:1912.09552 (replaced) [pdf, other]
Title: Robust Product-line Pricing under Generalized Extreme Value Models
Subjects: Optimization and Control (math.OC); Data Structures and Algorithms (cs.DS); Econometrics (econ.EM)
  arXiv:2011.07097 (replaced) [pdf, ps, other]
Title: Some remarks on hypergraph matching and the Füredi-Kahn-Seymour conjecture
Subjects: Combinatorics (math.CO); Data Structures and Algorithms (cs.DS)
  arXiv:2108.08050 (replaced) [pdf, ps, other]
Title: Worst-Case Efficient Dynamic Geometric Independent Set
Comments: Full version of ESA 2021 paper. Correction on the update time bounds for squares, hypercubes and unions of fat hyperrectangles (in the initial version, polylogarithmic update time was erroneously claimed, which is replaced here by polynomial sublinear update time)
Subjects: Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS)
  arXiv:2110.01773 (replaced) [pdf, other]
Title: Differentiable Equilibrium Computation with Decision Diagrams for Stackelberg Models of Combinatorial Congestion Games
Subjects: Computer Science and Game Theory (cs.GT); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Optimization and Control (math.OC)
  arXiv:2110.06172 (replaced) [pdf, ps, other]
Title: Complexity of optimizing over the integers
Authors: Amitabh Basu
Subjects: Optimization and Control (math.OC); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS)
  arXiv:2110.06559 (replaced) [pdf, other]
Title: Infinitely Divisible Noise in the Low Privacy Regime