GSmart: An Efficient SPARQL Query Engine Using Sparse Matrix Algebra -- Full Version

Chen, Yuedan; Özsu, M. Tamer; Xiao, Guoqing; Tang, Zhuo; Li, Kenli

Full-text links:

Download:

PDF only

Current browse context:

cs.DB

< prev | next >

new | recent | 2106

Computer Science > Databases

Title: GSmart: An Efficient SPARQL Query Engine Using Sparse Matrix Algebra -- Full Version

Authors: Yuedan Chen, M. Tamer Özsu, Guoqing Xiao, Zhuo Tang, Kenli Li

(Submitted on 26 Jun 2021)

Abstract: Efficient execution of SPARQL queries over large RDF datasets is a topic of considerable interest due to increased use of RDF to encode data. Most of this work has followed either relational or graph-based approaches. In this paper, we propose an alternative query engine, called gSmart, based on matrix algebra. This approach can potentially better exploit the computing power of high-performance heterogeneous architectures that we target. gSmart incorporates: (1) grouped incident edge-based SPARQL query evaluation, in which all unevaluated edges of a vertex are evaluated together using a series of matrix operations to fully utilize query constraints and narrow down the solution space; (2) a graph query planner that determines the order in which vertices in query graphs should be evaluated; (3) memory- and computation-efficient data structures including the light-weight sparse matrix (LSpM) storage for RDF data and the tree-based representation for evaluation results; (4) a multi-stage data partitioner to map the incident edge-based query evaluation into heterogeneous HPC architectures and develop multi-level parallelism; and (5) a parallel executor that uses the fine-grained processing scheme, pre-pruning technique, and tree-pruning technique to lower inter-node communication and enable high throughput. Evaluations of gSmart on a CPU+GPU HPC architecture show execution time speedups of up to 46920.00x compared to the existing SPARQL query engines on a single node machine. Additionally, gSmart on the Tianhe-1A supercomputer achieves a maximum speedup of 6.90x scaling from 2 to 16 CPU+GPU nodes.

Subjects:	Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2106.14038 [cs.DB]
	(or arXiv:2106.14038v1 [cs.DB] for this version)

Submission history

From: Yuedan Chen [view email]
[v1] Sat, 26 Jun 2021 15:03:34 GMT (2399kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2106.14038

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Databases

Title: GSmart: An Efficient SPARQL Query Engine Using Sparse Matrix Algebra -- Full Version

Submission history