Scaling a parallel program to modern supercomputers is challenging due to inter-process communication, Amdahl's law and resource contention. Performance analysis tools for finding such scaling bottlenecks are based on either profiling or tracing. Profiling incurs low overheads but does not capture detailed dependencies needed for root-cause analysis. Tracing collects all information at prohibitive overheads.
In this work, we design ScalAna that uses static analysis techniques to enable the analyzability of traces at a cost similar to profiling. ScalAna first leverages static compiler techniques and runtime lightweight techniques to build a Program Performance Graph. With this graph, we propose a novel backtracking algorithm to automatically detect the root causes. We evaluate ScalAna with real applications. Results show that ScalAna can effectively locate the root causes and incurs 1.73% overhead on average for up to 2048 processes. We achieve up to 11.11% performance improvement on 2048 processes by fixing the root causes.