We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DC

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Distributed, Parallel, and Cluster Computing

Title: Fault Awareness in the MPI 4.0 Session Model

Abstract: The latest version of MPI introduces new functionalities like the Session model, but it still lacks fault management mechanisms. Past efforts produced tools and MPI standard extensions to manage fault presence, including ULFM. These measures are effective against faults but do not fully support the new additions to the standard. In this paper, we combine the fault management possibilities of ULFM with the new Session model functionality introduced in version 4.0 of the standard. We focus on the communicator creation procedure, highlighting criticalities and proposing a method to circumvent them. The experimental campaign shows that the proposed solution does not significantly affect applications' execution time and scalability while better managing the insurgence of faults.
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
DOI: 10.1145/3587135.3592189
Cite as: arXiv:2303.02956 [cs.DC]
  (or arXiv:2303.02956v1 [cs.DC] for this version)

Submission history

From: Roberto Rocco [view email]
[v1] Mon, 6 Mar 2023 08:01:31 GMT (1656kb,D)

Link back to: arXiv, form interface, contact.