Data Motion Architecture and Consulting GmbH, Switzerland
Ronald spent a one-year sabbatical at ETH CSCS in 2019 working with the team that implemented COSMO in Gridtools for the Swiss weather prediction production. Here he architected a non-Von Neumann accelerator for stencils on regular grids targeting weather and climate simulations. This presentation shows how a co-design approach to a data centric design achieved the maximum performance, limited only by the accelerator memory BW of the HBM DRAM bus. A novel CGRA (Course Grain Reconfigurable Array) layout that reflects the grid points in the physical world (patent has been filed) is used to achieve this. This approach avoids the automatic caching mechanisms of CPUs and GPUs that limit achieving maximum performance. Three pragmas were used: grid point data may only be read once from HBM for an entire 3D grid sweep (stencil calculation), all data transferred over the HBM bus must be used in a calculation (i.e., cannot be evicted from cache before use) and HBM bus must run continually at sustained peak bandwidth.
The achieved perfromance for two stencil benchmarks and CGRA results will be shown, as well as next steps for this project.