The Fortran 2018 standard introduced syntax and semantics that allow a parallel application to recover from failed images (fail-stop processes) during execution. Teams are a key new language feature that facilitates this capability for applications that use collective subroutines: when a team of images is partitioned into one or more sets of new teams, only active images comprise the new teams; failed images are excluded.
This paper summarizes the language facilities for handling failed images specified in the Fortran 2018 standard and subsequent interpretations by the US Fortran Programming Language Standards Technical Committee. We propose standardizing some semantics that have been left processor (implementation) dependent to enable the development of portable fault-tolerant parallel Fortran applications.
Finally, we present a prototype implementation (based on GFortran, OpenCoarrays, and ULFM2) of a substantial subset of the Fortran 2018 failed images functionality, including semantic changes proposed herein.