Oak Ridge National Laboratory (ORNL), United States of America
As we move toward exascale, managing vast volumes of data and extracting knowledge from it in a timely way has become a challenge for science. In situ analysis of data is a viable alternative to processing large data volumes post-hoc. However, composing and orchestrating in situ workflows remains challenging due to variability in the system behavior at runtime and the need to address dynamically occurring events in the science application. Thus, dynamic control of a running workflow is an important step for the next generation of high performance computing.
This paper presents initial work towards a policy-driven framework for dynamically controllable in situ workflows. We present example policies that can be used to dynamically tune a running workflow, and show that to enact these policies, a framework will need to provide support in both the data management library and the workflow tools layer. We present a keyword-based specification to express simple policies. Through experiments on Summit, a pre-exascale supercomputer at Oak Ridge National Laboratory, we demonstrate the impact of a dynamically adaptive HPC workflow.