With the slowdown of Moore’s law and the stop of Dennard scaling, energy efficiency of compute hardware translates to compute power. Therefore, high-performance computing (HPC) systems tend to rely more and more on accelerators such as field-programmable gate arrays (FPGAs) to fuel highly demanding workloads, like Big Data applications or deep neuronal networks. These FPGAs are reconfigurable and sometimes no longer bus-attached to a CPU but directly connected to the data center network fabric as standalone nodes. This mix of CPUs and FPGAs leads to the creation of Reconfigurable Heterogeneous HPC (ReH2PC) clusters for which no established programming model exists, despite many proposals in the past. In contrast to this, the Message Passing Interface (MPI) has evolved as the de-facto standard to program classical HPC clusters, due to its high re-usability and to fast development of applications. This paper revisits the programming model of ReH2PC clusters and argues that MPI is suitable for programming heterogeneous clusters of FPGAs and CPUs.
We demonstrate a one-click solution for compiling and deploying a standard MPI application on ReH2PC clusters. Our framework implements a High-Level Synthesis (HLS) library, a specific run-time environment for FPGAs and CPUs, and a transpiler that closes the semantic gap between the MPI API and FPGA designs. Our experiments with 31 FPGAs show an average speedup of 4x and a 90% reduction of power consumption compared to a cluster of CPUs.