Virtual Institute – High Productivity Supercomputing

SC17 full-day tutorial: Hands-on Practical Hybrid Parallel Application Performance Engineering     (Denver, CO, USA)


Sunday 12th November 2017


  • Christian Feld, Jülich Supercomputing Centre
  • Markus Geimer, Jülich Supercomputing Centre
  • Sameer Shende, University of Oregon
  • Ronny Tschüter, Technische Universität Dresden
  • Brian Wylie, Jülich Supercomputing Centre


This page will be updated as additional information becomes available, so check back at least a few days before traveling to attend the tutorial. In particular, the currently available software and exercises are being updated in preparation for the tutorial.

The full-day hands-on tutorial takes place as part of the SC17 conference scheduled in room 405 of the Colorado Convention Center, Denver, Colorado, USA. Registration via the conference website (or on-site) is possible for the tutorial with or without including the conference technical program, exhibition and workshops.

Hands-on exercises will use temporary accounts provided by TACC on the Stampede supercomputer to build and run an MPI+OpenMP example code on two compute nodes with Intel Xeon Phi coprocessors, measuring and analysing intra-node and inter-node performance with VI-HPS tools.

In preparation, prior to arriving for the tutorial, participants are strongly encouraged to download the latest VI-HPS Linux Live ISO/OVA for execution within VirtualBox on their notebook computer. Connection to Stampede is expected to use the wireless network and will require SSH and X11. Since network latency will impact responsiveness of GUIs, Vampir and other graphical tools will be provided for native installation, or alternatively can be used from the VI-HPS Linux ISO/OVA. Downloading the 12GB ISO/OVA via the wireless network is not expected to work!


This tutorial presents state-of-the-art performance tools for leading-edge HPC systems founded on the community-developed Score-P instrumentation and measurement infrastructure, demonstrating how they can be used for performance engineering of effective scientific applications based on standard MPI, OpenMP, hybrid MPI+OpenMP, and increasingly common usage of accelerators. Parallel performance evaluation tools from the VI-HPS (Virtual Institute - High Productivity Supercomputing) are introduced and featured in hands-on exercises with Scalasca, Vampir and TAU. We present the complete workflow of performance engineering, including instrumentation, measurement (profiling and tracing, timing and PAPI hardware counters), data storage, analysis, and visualization. Emphasis is placed on how tools are used in combination for identifying performance problems and investigating optimization alternatives. Using their own notebook computers with a provided HPC Linux [] ISO/OVA image containing all of the necessary tools (running within a virtual machine), participants will conduct exercises on the Stampede system at TACC where remote access to Intel Xeon Phi accelerator-based nodes will be provided for the hands on sessions. This will help to prepare participants to locate and diagnose performance bottlenecks in their own parallel programs.


08:30 Introduction & basic measurement
  • [45] Introduction to VI-HPS & parallel application engineering [Wylie]
  • [15] Setup for hands-on exercises with Live-ISO/OVA & Stampede [all]
  • [30] Instrumentation & measurement of applications with Score-P [Feld]
  • 10:00 (break)
    10:30 Profile analyses
  • [30] Exploration & visualization of call-path profiles with CUBE [Geimer]
  • [30] Configuration & customization of Score-P measurements [Feld/Tschüter]
  • [30] Examination & visualization of profiles with TAU [Shende]
  • 12:00
    13:30 Trace analyses
  • [45] Automated analysis of traces for inefficiencies with Scalasca [Geimer]
  • [45] Interactive visualization and time-interval statistics with Vampir [Tschüter]
  • 15:00 (break)
    15:30 Further steps
  • [30] Specialized Score-P measurements and analyses [Feld]
  • [15] Performance data management with TAU PerfExplorer [Shende]
  • [30] Parallel application performance analysis case studies [all]
  • [15] Review & conclusion [Wylie]
  • 17:00 (adjourn)