SC22 full-day tutorial: Hands-on Practical Hybrid Parallel Application Performance Engineering (Dallas, TX, USA)
Monday 14th November 2022
- Sameer Shende, University of Oregon
- Anke Visser, Jülich Supercomputing Centre
- Bert Wesarg, Technische Universität Dresden
- Brian Wylie, Jülich Supercomputing Centre
- Marc Schlütter, Jülich Supercomputing Centre [remote]
- Bill Williams, Technische Universität Dresden [remote]
- Frank Winkler, Technische Universität Dresden
This page will be updated as information becomes available, so check back before traveling to attend the tutorial. Tutorials are planned to be live-streamed as part of the SC22 Digital Experience, however, remote participants will not receive assistance for hands-on parts. The currently available software and exercises are being updated in preparation for the tutorial.
The full-day hands-on tutorial takes place as part of the SC22 conference scheduled in room D171 of the Kay Bailey Hutchison Convention Center, Dallas, Texas, USA. Registration via the conference website is possible for the tutorial with or without including the conference technical program, exhibition and workshops.
Hands-on exercises will use temporary accounts provided by Jülich Supercomputing Centre (JSC) on the JUWELS-Booster modular supercomputer to build and run an MPI+CUDA example code on two compute nodes each with dual AMD EPYC 7402 24-core 'Rome' CPUs and quad Nvidia A100 'Ampere' GPUs, measuring and analysing intra-node and inter-node performance with VI-HPS tools. Access will be via the Jupyter-JSC service allowing an Xpra remote graphical shell environment to run within common web browsers. Tutorial participants are expected to use their own notebook computers, connecting via the SC22 conference wireless network, but no additional software needs to be installed.
Tutorial participants are strongly encouraged to register for a JUDOOR account to access the training project and its allocation on JUWELS-Booster. (Note that the other SC22 tutorial on Distributed GPU Programming which will also use this system is scheduled to run concurrently and will use a different training project.)
This tutorial presents state-of-the-art performance tools for leading-edge HPC systems founded on the community-developed Score-P instrumentation and measurement infrastructure, demonstrating how they can be used for performance engineering of effective scientific applications based on standard MPI, OpenMP, hybrid MPI+OpenMP, and increasingly common usage of accelerators. Parallel performance evaluation tools from the VI-HPS (Virtual Institute - High Productivity Supercomputing) are introduced and featured in hands-on exercises with Scalasca, Vampir and TAU. We present the complete workflow of performance engineering, including instrumentation, measurement (profiling and tracing, timing and PAPI hardware counters), data storage, analysis, and visualization. Emphasis is placed on how tools are used in combination for identifying performance problems and investigating optimization alternatives. Using their own notebook computers participants will conduct exercises on quad-A100 GPU nodes of the
Introduction & basic measurement