ISC-HPC'15 tutorial 06: Hands-on Practical Hybrid Parallel Application Performance Engineering (Frankfurt am Main, Germany)

Date

Sunday 12th July 2015

Presenters

Markus Geimer, Jülich Supercomputing Centre
Michael Gerndt, Technische Universität München
Sameer Shende, University of Oregon
Ronny Tschüter, Technische Universität Dresden

Logistics

The full-day hands-on tutorial takes place as part of the ISC-HPC'15 conference in the Kolleg room of the Frankfurt Messe, Frankfurt am Main, Germany. Registration via the conference website (or on-site) is possible for the tutorial with or without including the conference technical program, exhibition and workshops.

Hands-on exercises will use temporary accounts provided by TACC on the Stampede supercomputer to build and run an MPI+OpenMP example code on two compute nodes with Intel Xeon Phi coprocessors, measuring and analysing intra-node and inter-node performance with VI-HPS tools.

In preparation, prior to arriving for the tutorial, participants are strongly encouraged to download the latest VI-HPS Linux Live ISO/OVA for execution within VirtualBox on their notebook computer. Connection to Stampede is expected to use the wireless network and will require SSH and X11. Since network latency will impact responsiveness of GUIs, Vampir and other graphical tools will be provided for native installation, or alternatively can be used from the VI-HPS Linux ISO/OVA. Downloading the 12GB ISO/OVA via the wireless network is not expected to work!

Abstract

This tutorial presents state-of-the-art performance tools for leading-edge HPC systems founded on the community-developed Score-P instrumentation and measurement infrastructure, demonstrating how they can be used for performance engineering of effective scientific applications based on standard MPI, OpenMP, hybrid MPI+OpenMP, and increasingly common usage of accelerators. Parallel performance evaluation tools from the VI-HPS (Virtual Institute - High Productivity Supercomputing) are introduced and featured in hands-on exercises with Scalasca, Vampir, Periscope and TAU. We present the complete workflow of performance engineering, including instrumentation, measurement (profiling and tracing, timing and PAPI hardware counters), data storage, analysis, and visualization. Emphasis is placed on how tools are used in combination for identifying performance problems and investigating optimization alternatives. Using their own notebook computers with a provided HPC Linux [http://www.hpclinux.org] ISO/OVA image containing all of the necessary tools (running within a virtual machine), participants will conduct exercises on the Stampede system at TACC where remote access to Intel Xeon Phi accelerator-based nodes will be provided for the hands on sessions. This will help to prepare participants to locate and diagnose performance bottlenecks in their own parallel programs.

Programme

09:00	Introduction & basic profile measurement [40] Introduction to VI-HPS & parallel application engineering [Gerndt] [20] Setup for hands-on exercises with Live-ISO/OVA & Stampede [Shende] [30] Instrumentation & measurement of applications with Score-P [Tschüter] [30] Exploration & visualization of call-path profiles with CUBE [Geimer]
11:00	(break)
11:30	Advanced profiling [50] Configuration & customization of Score-P measurements [Tschüter/Geimer] [40] Examination & visualization of profiles with TAU [Shende]
13:00	(lunch)
14:00	Advanced analyses [40] Automated analysis of traces for inefficiencies with Scalasca [Geimer] [40] Interactive visualization and time-interval statistics with Vampir [Tschüter] [40] Online distributed bottleneck search with Periscope [Gerndt]
16:00	(break)
16:30	Case studies & conclusion [60] Finding typical parallel performance bottlenecks [Tschüter/Geimer/Shende] [30] Review & conclusion [Geimer]
18:00	(adjourn)