Virtual Institute — High Productivity Supercomputing

ISC-HPC'16 tutorial 01: Hands-on Practical Hybrid Parallel Application Performance Engineering (Frankfurt am Main, Germany)


Sunday 19th June 2016


  • Markus Geimer, Jülich Supercomputing Centre
  • Michael Gerndt, Technische Universität München
  • Allen Malony, University of Oregon
  • Ronny Tschüter, Technische Universität Dresden


This page will be updated as additional information becomes available, so check back at least a few days before traveling to attend the tutorial. In particular, the currently available software and exercises are being updated in preparation for the tutorial.

The full-day hands-on tutorial takes place as part of the ISC-HPC'16 conference in the Analog 1 room of the Frankfurt Messe, Frankfurt am Main, Germany. Registration via the conference website (or on-site) is possible for the tutorial with or without including the conference technical program, exhibition and workshops.

Hands-on exercises will use temporary accounts provided by TACC on the Stampede supercomputer to build and run an MPI+OpenMP example code on two compute nodes with Intel Xeon Phi coprocessors, measuring and analysing intra-node and inter-node performance with VI-HPS tools.

In preparation, prior to arriving for the tutorial, participants are strongly encouraged to download the latest VI-HPS Linux Live ISO/OVA for execution within VirtualBox on their notebook computer. Connection to Stampede is expected to use the wireless network and will require SSH and X11. Since network latency will impact responsiveness of GUIs, Vampir and other graphical tools will be provided for native installation, or alternatively can be used from the VI-HPS Linux ISO/OVA. Downloading the 12GB ISO/OVA via the wireless network is not expected to work!


This tutorial presents state-of-the-art performance tools for leading-edge HPC systems founded on the community-developed Score-P instrumentation and measurement infrastructure, demonstrating how they can be used for performance engineering of effective scientific applications based on standard MPI, OpenMP, hybrid MPI+OpenMP, and increasingly common usage of accelerators. Parallel performance evaluation tools from the VI-HPS (Virtual Institute - High Productivity Supercomputing) are introduced and featured in hands-on exercises with Scalasca, Vampir, Periscope Tuning Framework and TAU. We present the complete workflow of performance engineering, including instrumentation, measurement (profiling and tracing, timing and PAPI hardware counters), data storage, analysis, and visualization. Emphasis is placed on how tools are used in combination for identifying performance problems and investigating optimization alternatives. Using their own notebook computers with a provided HPC Linux [] ISO/OVA image containing all of the necessary tools (running within a virtual machine), participants will conduct exercises on the Stampede system at TACC where remote access to Intel Xeon Phi accelerator-based nodes will be provided for the hands on sessions. This will help to prepare participants to locate and diagnose performance bottlenecks in their own parallel programs.


09:00 Introduction & basic profile measurement
  • [45] Introduction to VI-HPS & parallel application engineering [Gerndt]
  • [15] Setup for hands-on exercises with Live-ISO/OVA & Stampede [all]
  • [30] Instrumentation & measurement of applications with Score-P [Tschüter]
  • [30] Exploration & visualization of call-path profiles with CUBE [Geimer]
  • 11:00 (break)
    11:30 Advanced profiling
  • [50] Configuration & customization of Score-P measurements [Tschüter/Geimer]
  • [40] Examination & visualization of profiles with TAU [Malony]
  • 13:00
    14:00 Advanced analyses
  • [40] Automated analysis of traces for inefficiencies with Scalasca [Geimer]
  • [40] Interactive visualization and time-interval statistics with Vampir [Tschüter]
  • [40] Online performance tuning with the Periscope Tuning Framework [Gerndt]
  • 16:00 (break)
    16:30 Case studies & conclusion
  • [60] Finding typical parallel performance bottlenecks [all]
  • [30] Review & conclusion [Geimer]
  • 18:00 (adjourn)