Lawrence Berkeley National Laboratory, United States of America
Scientific advances depend on the ability to effectively and efficiently use high performance computing (HPC) systems to manage and run large, complex scientific workflows. Toward understanding the characteristics of these large scientific workflows, we propose two methods to identify workflows with temporal connections and data-dependencies from batch queue and I/O logs available at HPC systems. We use the two methods to characterize and correlate workflow runtime with node requests, I/O patterns, and resource usage on three months of log data available for Cori, a supercomputer at NERSC. A key result from our analyses shows that single-job workflows often do not use all allocated CPUs that provides opportunities to consider allocating resources at a finer-granularity.