Texas Advanced Computing Center (TACC), University of Texas, United States of America
Modern processors, such as Intel's Scalable Xeon line, AMD's EPYC architecture, ARM's ThunderX2 design and IBM's Power9 architecture are scaling out rather than up and increasing in complexity. To achieve good application performance on modern processors, developers must write code amenable to vectorization, be aware of memory access patterns to optimize cache usage and understand how to balance multi-process programming (MPI) with multi-threaded programming (OpenMP).
This tutorial will cover serial and thread-parallel optimization, including introductory and intermediate concepts of vectorization and multi-threaded programming principles. We will address profiling techniques and tools and give a brief overview of modern HPC architectures.
The tutorial will include hands-on exercises. We will demonstrate the usage of profiling tools on TACC systems. This tutorial is designed for intermediate programmers, familiar with OpenMP and MPI, who wish to learn how to program for performance on modern architectures.