Virtual Institute — High Productivity Supercomputing

VI-HPS Tuning Workshop (JSC/RWTH Aachen, Germany) - POSTPONED

Date

To be determined. (Originally planned for Monday 08th - Friday 12th February 2021) Registration has been left open if you are interested in the workshop.

Location

The workshop will be held online, using the Zoom videoconference platform.

Organising Institutions

FZJ/JSC      JARA RWTH      hpc.nrw

Goals

This workshop organised by VI-HPS, Jülich Supercomputing Centre & RWTH Aachen ITC will:

  • give an overview of the VI-HPS programming tools suite
  • explain the functionality of individual tools, and how to use them effectively
  • offer hands-on experience and expert coaching assistance using the tools
    with your own application running on multi-GPU compute nodes

On completion participants should be familiar with common performance analysis and diagnosis techniques and how they can be employed in practice (on a range of HPC systems). Those who prepared their own application test cases will have been coached in the tuning of their measurement and analysis, and provided optimization suggestions.

Programme Overview

Presentations and hands-on sessions are planned on the following topics:

  • Setting up, welcome and introduction
  • APR performance reports
  • MUST runtime error detection for MPI
  • Caliper event annotation, logging & profiling
  • Nsight system-wide and CUDA kernel profiling
  • PAPI hardware performance counters
  • Score-P instrumentation and measurement
  • CUBE profile processing and exploration
  • TAU performance system
  • Scalasca automated trace analysis
  • Vampir interactive trace analysis
  • Paraver/Extrae/Dimemas trace analysis and performance prediction
  • JUBE script-based workflow batch execution environment
  • ... and potentially others to be added

A brief overview of the capabilities of these and associated tools is provided in the VI-HPS Tools Guide.

The workshop will be held in English and run from 09:00 to not later than 18:00 each day, with breaks.

Classroom capacity is limited, therefore priority will be given to applicants with parallel codes already running on the workshop computer system (JUWELS), and those bringing codes from similar GPU-accelerated Linux cluster systems to work on. Participants are encouraged to prepare their own GPU-accelerated MPI, OpenMP and hybrid MPI+OpenMP heterogeneous parallel application codes for analysis. Codes using multiple GPUs via OpenACC, OpenCL or CUDA will be given priority.

Programme in Detail (provisional) - all times given as CET (UTC+1)

Day 1: Monday
09:00 Welcome [Brian Wylie, JSC & Christian Terboven, RWTH]
  • Introduction to Zoom
  • Introduction to VI-HPS & overview of tools [Cédric Valensi, UVSQ]
  • Introduction to parallel performance engineering
  • JUWELS Booster module [JSC]
  • Building and running TeaLeaf_CUDA on JUWELS Booster [JSC]
  • ARM Performance Reports [JSC/ARM]
  • 10:30 (break)
    11:00 MUST runtime error detection for MPI [Joachim Protze, RWTH]
  • MUST hands-on exercises
  • 12:30 (lunch)
    14:00 Hands-on coaching to apply APR & MUST to analyze participants' own code(s).
    15:30 (break)
    16:00 Caliper annotation, logging & profiling [David Böhme, LLNL]
  • Caliper hands-on exercises
  • 17:30 Schedule for remainder of workshop
    18:00 (adjourn)

    Day 2: Tuesday
    09:00 Nsight Systems system-wide GPU performance analysis [Robert Dietrich, Nvidia]
  • Nsight Systems hands-on exercises
  • 10:30 (break)
    11:00 Nsight Compute CUDA kernel profiler [Felix Schmitt, Nvidia]
  • Nsight Compute hands-on exercises
  • 12:30 (lunch)
    14:00 Hands-on coaching to apply Nsight tools to analyze participants' own code(s).
    15:30 (break)
    16:00 PAPI for CUDA & NVML [Tony Castaldo, UTK-ICL]
  • PAPI hands-on exercises
  • 17:30 Schedule for remainder of workshop
    18:00 (adjourn)

    Day 3: Wednesday
    09:00 Score-P instrumentation & measurement toolset [JSC & Radita Liem, RWTH]
  • Score-P analysis scoring & measurement filtering
  • Score-P specialized instrumentation and measurement
  • Score-P hands-on exercises
  • CUBE profile explorer hands-on exercises [Anke Visser, JSC]
  • 10:30 (break)
    11:00 Hands-on coaching to apply Score-P/CUBE to analyze participants' own code(s).
    12:30 (lunch)
    14:00 Hands-on coaching to apply Score-P/CUBE to analyze participants' own code(s).
    15:30 (break)
    16:00 TAU performance system [Sameer Shende, UOregon]
  • TAU hands-on exercises
  • 17:30 Review of day and schedule for remainder of workshop
    18:00 (adjourn)

    Day 4: Thursday
    09:00 Scalasca automated trace analysis [Markus Geimer, JSC]
  • Scalasca hands-on exercises
  • Vampir interactive trace analysis [JSC/TUDresden]
  • Vampir hands-on exercises
  • 10:30 (break)
    11:00 Paraver tracing tools suite [Judit Giménez & Lau Mercadal, BSC]
  • Paraver hands-on exercises
  • 12:30 (lunch)
    14:00 Hands-on coaching to apply Scalasca/Vampir & Paraver to analyze participants' own code(s).
    15:30 (break)
    16:00 Hands-on coaching to apply tools to analyze participants' own code(s).
    17:30 Review of day and schedule for remainder of workshop
    18:00 (adjourn)

    Day 5: Friday
    09:00 JUBE batch execution automation [Sebastian Lührs, JSC]
  • JUBE hands-on exercises
  • 10:30 (break)
    11:00 Hands-on coaching to apply tools to analyze participants' own code(s).
    Review
    12:30 (lunch)
    14:00 Hands-on coaching to apply tools to analyze participants' own code(s).
    16:00 (adjourn)
     

    Hardware and Software Platforms

    JUWELS Booster Module: 936 BullSequana XH2000 compute nodes with two 24-core AMD EPYC Rome 7402 CPUs (each with 512GB DDR memory) and four NVIDIA Ampere A100 GPUs (each with 40GB HBM2e), with 200 Gb/s HDR InfiniBand interconnect.

    The local GPU-accelerated HPC system JUWELS Booster Module is the primary platform for the workshop and will be used for the hands-on exercises. Course accounts will be provided during the workshop to participants without existing accounts. Other systems where up-to-date versions of the tools are installed can also be used when preferred, though support may be limited and participants are expected to already possess user accounts on non-local systems. Regardless of whichever external systems they intend to use, participants should be familiar with the relevant procedures for compiling and running their parallel applications (via batch queues where appropriate).

    Registration

    Registration form: the number of participants is limited and early registration is recommended. Selection will be made based on the information provided when registering.

    NB: The JUWELS Booster Porting Workshop (25-26 January 2021) provides an opportunity for assistance porting existing GPU-enabled codes to the new system, which requires separate registration.

    Contact

    Tuning Workshop Series

    Cédric Valensi
    Université de Versailles Saint-Quentin-en-Yvelines
    Phone: +33 1 77 57 59 36
    Email: cedric.valensi[at]uvsq.fr
       

    Local Arrangements

    Brian Wylie
    Jülich Supercomputing Centre
    Phone: +49 (0)2461 61-6589
    Email: b.wylie[at]fz-juelich.de