We gratefully acknowledge support from
the Simons Foundation and member institutions.

Data Structures and Algorithms

New submissions

[ total of 11 entries: 1-11 ]
[ showing up to 1000 entries per page: fewer | more ]

New submissions for Fri, 9 Apr 21

[1]  arXiv:2104.03461 [pdf, other]
Title: Linear and Sublinear Time Spectral Density Estimation
Subjects: Data Structures and Algorithms (cs.DS); Numerical Analysis (math.NA)

We analyze the popular kernel polynomial method (KPM) for approximating the spectral density (eigenvalue distribution) of a real symmetric (or Hermitian) matrix $A \in \mathbb{R}^{n\times n}$. We prove that a simple and practical variant of the KPM algorithm can approximate the spectral density to $\epsilon$ accuracy in the Wasserstein-1 distance with roughly $O({1}/{\epsilon})$ matrix-vector multiplications with $A$. This yields a provable linear time result for the problem.
The KPM variant we study is based on damped Chebyshev polynomial expansions. We show that it is stable, meaning that it can be combined with any approximate matrix-vector multiplication algorithm for $A$. As an application, we develop an $O(n/\text{poly}(\epsilon))$ time algorithm for computing the spectral density of any $n\times n$ normalized graph adjacency or Laplacian matrix. This runtime is sublinear in the size of the matrix, and assumes sample access to the graph.
Our approach leverages several tools from approximation theory, including Jackson's seminal work on approximation with positive kernels [Jackson, 1912], and stability properties of three-term recurrence relations for orthogonal polynomials.

[2]  arXiv:2104.03484 [pdf, ps, other]
Title: Advances in Metric Ramsey Theory and its Applications
Authors: Yair Bartal
Comments: This is paper is still in stages of preparation, this version is not intended for distribution. A preliminary version of this article was written by the author in 2006, and was presented in the 2007 ICMS Workshop on Geometry and Algorithms. The basic result on constructive metric Ramsey decomposition and metric Ramsey theorem has also appeared in the author's lectures notes
Subjects: Data Structures and Algorithms (cs.DS); Computational Geometry (cs.CG); Metric Geometry (math.MG)

Metric Ramsey theory is concerned with finding large well-structured subsets of more complex metric spaces. For finite metric spaces this problem was first studies by Bourgain, Figiel and Milman \cite{bfm}, and studied further in depth by Bartal et. al \cite{BLMN03}. In this paper we provide deterministic constructions for this problem via a novel notion of \emph{metric Ramsey decomposition}. This method yields several more applications, reflecting on some basic results in metric embedding theory.
The applications include various results in metric Ramsey theory including the first deterministic construction yielding Ramsey theorems with tight bounds, a well as stronger theorems and properties, implying appropriate distance oracle applications.
In addition, this decomposition provides the first deterministic Bourgain-type embedding of finite metric spaces into Euclidean space, and an optimal multi-embedding into ultrametrics, thus improving its applications in approximation and online algorithms.
The decomposition presented here, the techniques and its consequences have already been used in recent research in the field of metric embedding for various applications.

[3]  arXiv:2104.03932 [pdf, ps, other]
Title: Universally-Optimal Distributed Algorithms for Known Topologies
Comments: Full version of extended abstract in STOC 2021
Subjects: Data Structures and Algorithms (cs.DS)

Many distributed optimization algorithms achieve existentially-optimal running times, meaning that there exists some pathological worst-case topology on which no algorithm can do better. Still, most networks of interest allow for exponentially faster algorithms. This motivates two questions: (1) What network topology parameters determine the complexity of distributed optimization? (2) Are there universally-optimal algorithms that are as fast as possible on every topology?
We resolve these 25-year-old open problems in the known-topology setting (i.e., supported CONGEST) for a wide class of global network optimization problems including MST, $(1+\varepsilon)$-min cut, various approximate shortest paths problems, sub-graph connectivity, etc.
In particular, we provide several (equivalent) graph parameters and show they are tight universal lower bounds for the above problems, fully characterizing their inherent complexity. Our results also imply that algorithms based on the low-congestion shortcut framework match the above lower bound, making them universally optimal if shortcuts are efficiently approximable. We leverage a recent result in hop-constrained oblivious routing to show this is the case if the topology is known -- giving universally-optimal algorithms for all above problems.

Cross-lists for Fri, 9 Apr 21

[4]  arXiv:2104.03353 (cross-list from cs.DB) [pdf, other]
Title: Correlation Sketches for Approximate Join-Correlation Queries
Comments: Proceedings of the 2021 International Conference on Management of Data (SIGMOD '21)
Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR)

The increasing availability of structured datasets, from Web tables and open-data portals to enterprise data, opens up opportunities~to enrich analytics and improve machine learning models through relational data augmentation. In this paper, we introduce a new class of data augmentation queries: join-correlation queries. Given a column $Q$ and a join column $K_Q$ from a query table $\mathcal{T}_Q$, retrieve tables $\mathcal{T}_X$ in a dataset collection such that $\mathcal{T}_X$ is joinable with $\mathcal{T}_Q$ on $K_Q$ and there is a column $C \in \mathcal{T}_X$ such that $Q$ is correlated with $C$. A na\"ive approach to evaluate these queries, which first finds joinable tables and then explicitly joins and computes correlations between $Q$ and all columns of the discovered tables, is prohibitively expensive. To efficiently support correlated column discovery, we 1) propose a sketching method that enables the construction of an index for a large number of tables and that provides accurate estimates for join-correlation queries, and 2) explore different scoring strategies that effectively rank the query results based on how well the columns are correlated with the query. We carry out a detailed experimental evaluation, using both synthetic and real data, which shows that our sketches attain high accuracy and the scoring strategies lead to high-quality rankings.

[5]  arXiv:2104.03673 (cross-list from cs.DC) [pdf, other]
Title: Practical Byzantine Reliable Broadcast on Partially Connected Networks
Comments: This is a preprint of a paper that will appear at the IEEE ICDCS 2021 conference
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Networking and Internet Architecture (cs.NI)

In this paper, we consider the Byzantine reliable broadcast problem on authenticated and partially connected networks. The state-of-the-art method to solve this problem consists in combining two algorithms from the literature. Handling asynchrony and faulty senders is typically done thanks to Gabriel Bracha's authenticated double-echo broadcast protocol, which assumes an asynchronous fully connected network. Danny Dolev's algorithm can then be used to provide reliable communications between processes in the global fault model, where up to f processes among N can be faulty in a communication network that is at least 2f+1-connected. Following recent works that showed that Dolev's protocol can be made more practical thanks to several optimizations, we show that the state-of-the-art methods to solve our problem can be optimized thanks to layer-specific and cross-layer optimizations. Our simulations with the Omnet++ network simulator show that these optimizations can be efficiently combined to decrease the total amount of information transmitted or the protocol's latency (e.g., respectively, -25% and -50% with a 16B payload, N=31 and f=4) compared to the state-of-the-art combination of Bracha's and Dolev's protocols.

Replacements for Fri, 9 Apr 21

[6]  arXiv:1901.03627 (replaced) [pdf, ps, other]
Title: Destroying Bicolored $P_3$s by Deleting Few Edges
Comments: 30 pages
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Combinatorics (math.CO)
[7]  arXiv:2006.08668 (replaced) [pdf, ps, other]
Title: Algorithmic Aspects of Temporal Betweenness
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)
[8]  arXiv:2011.03622 (replaced) [pdf, ps, other]
Title: Settling the Robust Learnability of Mixtures of Gaussians
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
[9]  arXiv:1602.07570 (replaced) [pdf, ps, other]
Title: Bayesian Exploration: Incentivizing Exploration in Bayesian Games
Comments: All revisions focused on presentation; all results (except Appendix C) have been present since the initial version
Subjects: Computer Science and Game Theory (cs.GT); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
[10]  arXiv:1801.07029 (replaced) [pdf, other]
Title: Deterministic Scheduling of Periodic Messages for Low Latency in Cloud RAN
Comments: 40 pages, 23 Figures
Subjects: Networking and Internet Architecture (cs.NI); Data Structures and Algorithms (cs.DS)
[11]  arXiv:2012.03879 (replaced) [pdf, other]
Title: Sequential Stratified Regeneration: MCMC for Large State Spaces with an Application to Subgraph Count Estimation
Comments: Markov Chain Monte Carlo, Random Walk, Regenerative Sampling, Motif Analysis, Subgraph Counting, Graph Mining
Subjects: Social and Information Networks (cs.SI); Data Structures and Algorithms (cs.DS)
[ total of 11 entries: 1-11 ]
[ showing up to 1000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2104, contact, help  (Access key information)