Contiguous Graph Partitioning For Optimal Total Or Bottleneck Communication

Ahrens, Willow

Full-text links:

Download:

Current browse context:

cs.DS

< prev | next >

new | recent | 2007

Change to browse by:

Computer Science > Data Structures and Algorithms

Title: Contiguous Graph Partitioning For Optimal Total Or Bottleneck Communication

Authors: Willow Ahrens

(Submitted on 31 Jul 2020 (v1), last revised 21 Jun 2021 (this version, v4))

Abstract: Graph partitioning schedules parallel calculations like sparse matrix-vector multiply (SpMV). We consider contiguous partitions, where the $m$ rows (or columns) of a sparse matrix with $N$ nonzeros are split into $K$ parts without reordering. We propose the first near-linear time algorithms for several graph partitioning problems in the contiguous regime.
Traditional objectives such as the simple edge cut, hyperedge cut, or hypergraph connectivity minimize the total cost of all parts under a balance constraint. Our total partitioners use $O(Km + N)$ space. They run in $O((Km\log(m) + N)\log(N))$ time, a significant improvement over prior $O(K(m^2 + N))$ time algorithms due to Kernighan and Grandjean et. al.
Bottleneck partitioning minimizes the maximum cost of any part. We propose a new bottleneck cost which reflects the sum of communication and computation on each part. Our bottleneck partitioners use linear space. The exact algorithm runs in linear time when $K^2$ is $O(N^C)$ for $C < 1$. Our $(1 + \epsilon)$-approximate algorithm runs in linear time when $K\log(c_{high}/(c_{low}\epsilon))$ is $O(N^C)$ for $C < 1$, where $c_{high}$ and $c_{low}$ are upper and lower bounds on the optimal cost. We also propose a simpler $(1 + \epsilon)$-approximate algorithm which runs in a factor of $\log(c_{high}/(c_{low}\epsilon))$ from linear time.
We empirically demonstrate that our algorithms efficiently produce high-quality contiguous partitions on a test suite of 42 test matrices. When $K = 8$, our hypergraph connectivity partitioner achieved a speedup of $53\times$ (mean $15.1\times$) over prior algorithms. The mean runtime of our bottleneck partitioner was 5.15 SpMVs.

Comments:	20 pages; added total partitioning algorithm, added total costs, added experimental results, added lazy near-linear bisection algorithm, simplified presentation. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects:	Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:2007.16192 [cs.DS]
	(or arXiv:2007.16192v4 [cs.DS] for this version)

Submission history

From: Willow Ahrens [view email]
[v1] Fri, 31 Jul 2020 17:41:31 GMT (35kb,D)
[v2] Mon, 9 Nov 2020 17:46:28 GMT (1160kb,D)
[v3] Sun, 29 Nov 2020 00:11:21 GMT (972kb,D)
[v4] Mon, 21 Jun 2021 17:02:26 GMT (2903kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2007.16192

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Data Structures and Algorithms

Title: Contiguous Graph Partitioning For Optimal Total Or Bottleneck Communication

Submission history