We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DC

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Distributed, Parallel, and Cluster Computing

Title: Real-Time Computation of Parameter Fitting and Image Reconstruction Using Graphical Processing Units

Abstract: In recent years graphical processing units (GPUs) have become a powerful tool in scientific computing. Their potential to speed up highly parallel applications brings the power of high performance computing to a wider range of users. However, programming these devices and integrating their use in existing applications is still a challenging task.
In this paper we examined the potential of GPUs for two different applications. The first application, created at Paul Scherrer Institut (PSI), is used for parameter fitting during data analysis of muSR (muon spin rotation, relaxation and resonance) experiments. The second application, developed at ETH, is used for PET (Positron Emission Tomography) image reconstruction and analysis. Applications currently in use were examined to identify parts of the algorithms in need of optimization. Efficient GPU kernels were created in order to allow applications to use a GPU, to speed up the previously identified parts. Benchmarking tests were performed in order to measure the achieved speedup
During this work, we focused on single GPU systems to show that real time data analysis of these problems can be achieved without the need for large computing clusters. The results show that the currently used application for parameter fitting, which uses OpenMP to parallelize calculations over multiple CPU cores, can be accelerated around 40 times through the use of a GPU. The speedup may vary depending on the size and complexity of the problem. For PET image analysis, the obtained speedups of the GPU version was more than x40 larger compared to a single core CPU implementation. The achieved results show that it is possible to improve the execution time by orders of magnitude.
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computational Physics (physics.comp-ph)
DOI: 10.1016/j.cpc.2017.02.007
Cite as: arXiv:1604.02334 [cs.DC]
  (or arXiv:1604.02334v2 [cs.DC] for this version)

Submission history

From: Uldis Locans [view email]
[v1] Fri, 8 Apr 2016 12:47:47 GMT (140kb,D)
[v2] Tue, 22 Nov 2016 14:49:44 GMT (140kb,D)

Link back to: arXiv, form interface, contact.