Lawrence Livermore National Laboratory, United States of America
The simulation environment of any HPC platform is key to the performance, portability and productivity of scientific applications. This environment has traditionally been provided by platform vendors, presenting challenges for HPC centers and users; including platform-specific software that tends to stagnate over the lifetime of the system. In this paper, we present the Tri-Laboratory Operating System Stack (TOSS), a production simulation environment based on Linux and open source software, with proprietary software components integrated as needed. TOSS, focused on mid-to-large scale commodity HPC systems, provides a common simulation environment across system architectures, reduces the learning curve on new systems and benefits from a lineage of past experience and bug fixes. To further the scope and applicability of TOSS, we demonstrate its feasibility and effectiveness on a leadership-class supercomputer architecture. Our evaluation, relative to the vendor stack, includes an analysis of resource manager complexity, system noise, networking and application performance.