This tutorial is a practical guide on how to effectively run distributed deep learning over multiple compute nodes. Domain scientists are embracing DL as both a standalone data science method and an effective approach to reducing dimensionality in the traditional simulation. We have seen the fusion of DL and high-performance computing (HPC): supercomputers show an unparalleled capacity to reduce DL training time; HPC techniques have been used to speed up parallel DL training. Distributed deep learning has great potential to augment DL applications by leveraging existing high-performance computing clusters. In this tutorial, we will give an overview of the state-of-art approaches to enabling deep learning at scale followed by an interactive hands-on session to help attendees running distributed deep learning on Frontera at the Texas Advanced Computing Center. Lastly, we will discuss best practices on how to scale, evaluate and tune performance.