We describe a strategy for code modernisation of Gadget, a widely used community code for computational astrophysics. The focus of this work is on node-level performance optimisation, targeting current multi/many-core Intel architectures. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm. The code modifications include threading parallelism optimisation, change of the data layout into Structure of Arrays (SoA), auto-vectorisation and algorithmic improvements in the particle sorting. We measure lower execution time and improved threading scalability both on Intel Xeon ($2.6 \times$ on Ivy Bridge) and Xeon Phi ($13.7 \times$ on Knights Corner) systems. First tests on second generation Xeon Phi (Knights Landing) demonstrate the portability of the devised optimisation solutions to upcoming architectures.
F. Baruffa, L. Iapichino, N. Hammer, et. al.
Tue, 20 Dec 16
Comments: 18 pages, 5 figures, submitted