Data Structures and Algorithms
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Thu, 9 Jul 20
 [1] arXiv:2007.03829 [pdf, ps, other]

Title: An Improved Upper Bound for SATSubjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
We show that the CNF satisfiability problem can be solved $O^*(1.2226^m)$ time, where $m$ is the number of clauses in the formula, improving the known upper bounds $O^*(1.234^m)$ given by Yamamoto 15 years ago and $O^*(1.239^m)$ given by Hirsch 22 years ago. By using an amortized technique and careful case analysis, we successfully avoid the bottlenecks in previous algorithms and get the improvement.
 [2] arXiv:2007.03859 [pdf, other]

Title: FixedTreewidthEfficient Algorithms for EdgeDeletion to Intersection Graph ClassesSubjects: Data Structures and Algorithms (cs.DS)
For a graph class $\mathcal{C}$, the $\mathcal{C}$\textsc{EdgeDeletion} problem asks for a given graph $G$ to delete the minimum number of edges from $G$ in order to obtain a graph in $\mathcal{C}$. We study the $\mathcal{C}$\textsc{EdgeDeletion} problem for $\mathcal{C}$ the permutation graphs, interval graphs, and other related graph classes. It follows from Courcelle's Theorem that these problems are fixed parameter tractable when parameterized by treewidth. In this paper, we present concrete FPT algorithms for these problems. By giving explicit algorithms and analyzing these in detail, we obtain algorithms that are significantly faster than the algorithms obtained by using Courcelle's theorem.
 [3] arXiv:2007.03927 [pdf, ps, other]

Title: Near Input Sparsity Time Kernel Embeddings via Adaptive SamplingSubjects: Data Structures and Algorithms (cs.DS)
To accelerate kernel methods, we propose a near input sparsity time algorithm for sampling the highdimensional feature space implicitly defined by a kernel transformation. Our main contribution is an importance sampling method for subsampling the feature space of a degree $q$ tensoring of data points in almost input sparsity time, improving the recent oblivious sketching method of (Ahle et al., 2020) by a factor of $q^{5/2}/\epsilon^2$. This leads to a subspace embedding for the polynomial kernel, as well as the Gaussian kernel, with a target dimension that is only linearly dependent on the statistical dimension of the kernel and in time which is only linearly dependent on the sparsity of the input dataset. We show how our subspace embedding bounds imply new statistical guarantees for kernel ridge regression. Furthermore, we empirically show that in largescale regression tasks, our algorithm outperforms stateoftheart kernel approximation methods.
 [4] arXiv:2007.03933 [pdf, other]

Title: LinearTime Algorithms for Computing Twinless Strong Articulation Points and Related ProblemsSubjects: Data Structures and Algorithms (cs.DS)
A directed graph $G=(V,E)$ is twinless strongly connected if it contains a strongly connected subgraph without any pair of antiparallel (or twin) edges. The twinless strongly connected components (TSCCs) of a directed graph $G$ are its maximal twinless strongly connected subgraphs. These concepts have several diverse applications, such as the design of telecommunication networks and the structural stability of buildings. A vertex $v \in V$ is a twinless strong articulation point of $G$, if the deletion of $v$ increases the number of TSCCs of $G$. Here, we present the first lineartime algorithm that finds all the twinless strong articulation points of a directed graph. We show that the computation of twinless strong articulation points reduces to the following problem in undirected graphs, which may be of independent interest: Given a $2$vertexconnected (biconnected) undirected graph $H$, find all vertices $v$ that belong to a vertexedge cut pair, i.e., for which there exists an edge $e$ such that $H \setminus \{v,e\}$ is not connected. We develop a lineartime algorithm that not only finds all such vertices $v$, but also computes the number of edges $e$ such that $H \setminus \{v,e\}$ is not connected. This also implies that for each twinless strong articulation point $v$, that is not a strong articulation point in a strongly connected digraph $G$, we can compute the number of TSCCs in $G \setminus v$. We note that the problem of computing all vertices that belong to a vertexedge cut pair can be solved in lineartime by exploiting the structure of $3$vertex connected (triconnected) components of $H$, represented by an SPQR tree of $H$. Our approach, however, is conceptually simple and thus likely to be more amenable to practical implementations.
 [5] arXiv:2007.03946 [pdf, ps, other]

Title: A Technique for Obtaining True Approximations for $k$Center with Covering ConstraintsSubjects: Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC)
There has been a recent surge of interest in incorporating fairness aspects into classical clustering problems. Two recently introduced variants of the $k$Center problem in this spirit are Colorful $k$Center, introduced by Bandyapadhyay, Inamdar, Pai, and Varadarajan, and lottery models, such as the Fair Robust $k$Center problem introduced by Harris, Pensyl, Srinivasan, and Trinh. To address fairness aspects, these models, compared to traditional $k$Center, include additional covering constraints. Prior approximation results for these models require to relax some of the normally hard constraints, like the number of centers to be opened or the involved covering constraints, and therefore, only obtain constantfactor pseudoapproximations. In this paper, we introduce a new approach to deal with such covering constraints that leads to (true) approximations, including a $4$approximation for Colorful $k$Center with constantly many colorssettling an open question raised by Bandyapadhyay, Inamdar, Pai, and Varadarajanand a $4$approximation for Fair Robust $k$Center, for which the existence of a (true) constantfactor approximation was also open. We complement our results by showing that if one allows an unbounded number of colors, then Colorful $k$Center admits no approximation algorithm with finite approximation guarantee, assuming that $\mathrm{P} \neq \mathrm{NP}$. Moreover, under the Exponential Time Hypothesis, the problem is inapproximable if the number of colors grows faster than logarithmic in the size of the ground set.
 [6] arXiv:2007.03950 [pdf, other]

Title: Mining Dense Subgraphs with Similar EdgesSubjects: Data Structures and Algorithms (cs.DS)
When searching for interesting structures in graphs, it is often important to take into account not only the graph connectivity, but also the metadata available, such as node and edge labels, or temporal information. In this paper we are interested in settings where such metadata is used to define a similarity between edges. We consider the problem of finding subgraphs that are dense and whose edges are similar to each other with respect to a given similarity function. Depending on the application, this function can be, for example, the Jaccard similarity between the edge label sets, or the temporal correlation of the edge occurrences in a temporal graph. We formulate a Lagrangian relaxationbased optimization problem to search for dense subgraphs with high pairwise edge similarity. We design a novel algorithm to solve the problem through parametric MinCut, and provide an efficient search scheme to iterate through the values of the Lagrangian multipliers. Our study is complemented by an evaluation on realworld datasets, which demonstrates the usefulness and efficiency of the proposed approach.
 [7] arXiv:2007.04008 [pdf, ps, other]

Title: Waypoint Routing on Bounded Treewidth GraphsSubjects: Data Structures and Algorithms (cs.DS)
In the \textsc{Waypoint Routing Problem} one is given an undirected capacitated and weighted graph $G$, a sourcedestination pair $s,t\in V(G)$ and a set $W\subseteq V(G)$, of \emph{waypoints}. The task is to find a walk which starts at the source vertex $s$, visits, in any order, all waypoints, ends at the destination vertex $t$, respects edge capacities, that is, traverses each edge at most as many times as is its capacity, and minimizes the cost computed as the sum of costs of traversed edges with multiplicities. We study the problem for graphs of bounded treewidth and present a new algorithm for the problem working in $2^{O(\mathrm{tw})}\cdot n$ time, significantly improving upon the previously known algorithms. We also show that this running time is optimal for the problem under Exponential Time Hypothesis.
 [8] arXiv:2007.04059 [pdf, ps, other]

Title: Fair Colorful kCenter ClusteringComments: 19 pages, 5 figures. A preliminary version of this work was presented at the 21st Conference on Integer Programming and Combinatorial Optimization (IPCO 2020)Journalref: In: Integer Programming and Combinatorial Optimization. IPCO 2020. LNCS, vol 12125. pp 209222Subjects: Data Structures and Algorithms (cs.DS)
An instance of colorful kcenter consists of points in a metric space that are colored red or blue, along with an integer k and a coverage requirement for each color. The goal is to find the smallest radius \r{ho} such that there exist balls of radius \r{ho} around k of the points that meet the coverage requirements. The motivation behind this problem is twofold. First, from fairness considerations: each color/group should receive a similar service guarantee, and second, from the algorithmic challenges it poses: this problem combines the difficulties of clustering along with the subsetsum problem. In particular, we show that this combination results in strong integrality gap lower bounds for several natural linear programming relaxations. Our main result is an efficient approximation algorithm that overcomes these difficulties to achieve an approximation guarantee of 3, nearly matching the tight approximation guarantee of 2 for the classical kcenter problem which this problem generalizes.
 [9] arXiv:2007.04128 [pdf, other]

Title: String Indexing for Top$k$ Close Consecutive OccurrencesSubjects: Data Structures and Algorithms (cs.DS)
The classic string indexing problem is to preprocess a string $S$ into a compact data structure that supports efficient subsequent pattern matching queries, that is, given a pattern string $P$, report all occurrences of $P$ within $S$. In this paper, we study a basic and natural extension of string indexing called the string indexing for top$k$ close consecutive occurrences problem (SITCCO). Here, a consecutive occurrence is a pair $(i,j)$, $i < j$, such that $P$ occurs at positions $i$ and $j$ in $S$ and there is no occurrence of $P$ between $i$ and $j$, and their distance is defined as $ji$. Given a pattern $P$ and a parameter $k$, the goal is to report the top$k$ consecutive occurrences of $P$ in $S$ of minimal distance. The challenge is to compactly represent $S$ while supporting queries in time close to length of $P$ and $k$. We give two new timespace tradeoffs for the problem. Our first result achieves nearlinear space and optimal query time, and our second result achieves linear space and near optimal query time. Along the way, we develop several techniques of independent interest, including a new translation of the problem into a line segment intersection problem and a new recursive clustering technique for trees.
Crosslists for Thu, 9 Jul 20
 [10] arXiv:2007.03909 (crosslist from cs.CL) [pdf, other]

Title: BestFirst Beam SearchComments: TACL 2020Subjects: Computation and Language (cs.CL); Data Structures and Algorithms (cs.DS)
Decoding for many NLP tasks requires a heuristic algorithm for approximating exact search since the full search space is often intractable if not simply too large to traverse efficiently. The default algorithm for this job is beam searcha pruned version of breadthfirst searchwhich in practice, returns better results than exact inference due to beneficial search bias. In this work, we show that standard beam search is a computationally inefficient choice for many decoding tasks; specifically, when the scoring function is a monotonic function in sequence length, other search algorithms can be used to reduce the number of calls to the scoring function (e.g., a neural network), which is often the bottleneck computation. We propose bestfirst beam search, an algorithm that provably returns the same set of results as standard beam search, albeit in the minimum number of scoring function calls to guarantee optimality (modulo beam size). We show that bestfirst beam search can be used with length normalization and mutual information decoding, among other rescoring functions. Lastly, we propose a memoryreduced variant of bestfirst beam search, which has a similar search bias in terms of downstream performance, but runs in a fraction of the time.
Replacements for Thu, 9 Jul 20
 [11] arXiv:1808.06705 (replaced) [pdf, ps, other]

Title: Graph connectivity in log steps using label propagationAuthors: Paul BurkhardtComments: Retracting claim of logdiameter convergence. Updated algorithm to version designed in October 2018. Added experimentsSubjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
 [12] arXiv:1811.09045 (replaced) [pdf, ps, other]

Title: Tight Approximation for Unconstrained XOS MaximizationComments: 18 pagesSubjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)
 [13] arXiv:1907.01241 (replaced) [pdf, other]

Title: On the VCdimension of halfspaces with respect to convex setsSubjects: Computational Geometry (cs.CG); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
 [14] arXiv:1907.07889 (replaced) [pdf, other]

Title: The simultaneous conjugacy problem in the symmetric groupSubjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, cs, recent, 2007, contact, help (Access key information)