Sandia National Laboratories, United States of America
We present an effort to port the nonhydrostatic atmosphere dycore of the Energy Exascale Earth System Model (E3SM) to efficiently run on GPU architectures, specifically targeting cloud-resolving resolutions of 3 km and 1 km. To express on-node parallelism we use the C++ library Kokkos, which allows us to achieve a performance portable code in a largely architecture-independent way. Our implementation is able to achieve 0.97 simulated years per day, running on the full Summit supercomputer when using the GPUs. To the best of our knowledge, this is the highest achievement to date by any global atmosphere dycore running at such resolutions. Moreover, our C++ implementation is on par with (or slightly better than) the original Fortran implementation on CPUs, proving that the GPU port did not compromise the efficiency on conventional CPUs.