Virtual Institute — High Productivity Supercomputing

43rd VI-HPS Tuning Workshop (CALMIP, Toulouse, France)


Monday 29th January - Thursday 1st February 2024


The workshop will take place at the CALMIP Mesocentre, Espace Clément Ader, 3, rue Caroline Aigle, 31400 Toulouse, France.

Organising Institutions



This workshop organised by VI-HPS and the CALMIP Mesocentre will:

  • give an overview of the VI-HPS programming tools suite
  • explain the functionality of individual tools, and how to use them effectively
  • offer hands-on experience and expert assistance using the tools

On completion participants should be familiar with common performance analysis and diagnosis techniques and how they can be employed in practice (on a range of HPC systems). Those who prepared their own application test cases will have been coached in the tuning of their measurement and analysis, and provided optimization suggestions.

Programme Overview

Presentations and hands-on sessions are planned on the following topics:

  • Setting up, welcome and introduction
  • TAU performance system
  • MAQAO performance analysis & optimisation
  • Score-P instrumentation and measurement
  • CUBE profile processing and exploration
  • Scalasca automated trace analysis
  • Verificarlo numerical accuracy analysis
  • ... and potentially others to be added

A brief overview of the capabilities of these and associated tools is provided in the VI-HPS Tools Guide.

The workshop will be held in English and run from 09:00 to not later than 18:00 each day, with breaks for lunch and refreshments.

Classroom capacity is limited, therefore priority may be given to applicants with parallel codes already running on the workshop computer system (TURPAN), and those bringing codes from similar ARM Linux cluster systems to work on. Participants are encouraged to prepare their own MPI, OpenMP and hybrid MPI+OpenMP parallel application codes for analysis. Codes using multiple GPUs via OpenACC, OpenCL or CUDA may also be analysed.

TW43@CALMIP class

Programme in Detail (provisional) - all times given as CET (UTC+1)

Day 1: Monday 29 January
13:30 (registration)
14:00 Welcome [Nicolas Renon, CALMIP]
  • Workshop agenda [Cédric Valensi, UVSQ]
  • Introduction to CALMIP [Nicolas Renon, CALMIP]
  • The TURPAN system
  • Introduction to VI-HPS & overview of tools [Cédric Valensi, UVSQ]
  • Introduction to parallel performance engineering
  • Building and running an MPI+OpenACC code on TURPAN [Brian Wylie, JSC]
    15:30 (break)
    16:00 TAU performance system [Sameer Shende, UOregon]
  • TAU hands-on exercises
  • 17:30 Schedule for remainder of workshop
    18:00 (adjourn)
    Day 2: Tuesday 30 January
    09:00 MAQAO performance analysis tools [Cédric Valensi, Hugo Bolloré & Emmanuel Oseret, UVSQ]
  • MAQAO hands-on exercises (MAQAO quick reference)
  • 10:30 (break)
    11:00 Verificarlo numerical accuracy analysis [Pablo de Oliveira, UVSQ]
  • Verificarlo hands-on exercises
  • 12:30 (lunch)
    14:00 Hands-on coaching to apply MAQAO & Verificarlo to analyze participants' own code(s).
    15:30 (break)
    16:00 Hands-on coaching to apply MAQAO & Verificarlo to analyze participants' own code(s).
    17:30 (adjourn)
    Day 3: Wednesday 31 January
    09:00 Score-P instrumentation & measurement toolset [Brian Wylie, JSC]
  • CUBE profile explorer hands-on exercises
  • 10:30 (break)
    11:00 Scalasca automated trace analysis toolset [Brian Wylie, JSC]
  • Score-P measurement scoring & customisation
  • 12:30 (lunch)
    14:00 Hands-on coaching to apply Score-P/CUBE to analyze participants' own code(s).
    15:30 (break)
    16:00 Hands-on coaching to apply Score-P/CUBE to analyze participants' own code(s).
    17:30 (adjourn)
    Day 4: Thursday 01 February
    09:00 Review [Cédric Valensi, UVSQ]
    Hands-on coaching to apply tools to analyze participants' own code(s)
    10:30 (break)
    11:00 Hands-on coaching to apply tools to analyze participants' own code(s).
    12:30 (adjourn)

    Hardware and Software Platforms


    • In a node, the detailed architecture is as follows: around the CPU, we have 512GB RAM divided into 8 x 64GB dims on independent channels, two Nvidia A100 GPU cards with 80 GB of memory, connected via PCI express x16, 2 infiniband 200GB/s network cards each, also connected via PCI express x16, 6TB of local storage, and standard connectivity (USB, Ethernet etc).
    • In a Turpan node, the processor is an Ampere Altra Q80-30, with 80 cores at 3GHz, implementing an ARM version 8.2 architecture, with a data transfer speed of 3200 MT/s. Computing power is 1.9 TF/s per socket. Turpan also has 2 Nvidia A100-80 GPU accelerators, each with 6912 Streaming Multiprocessors (SM). The peak performance of a GPU is 19.5 Tflops. In total, at maximum load, when using 80 CPU cores and 2 GPU accelerators, the supercomputer's peak performance is 40.9 Tflops. In theory, with 15 nodes, Turpan has a power of 613.5 Tflops.
    • In terms of storage, the Turpan machine has 360 TB on mechanical disks for scratch and project storage. And 17TB of SSDs used as cache to accelerate output. Physically, there are 60 x 8TB mechanical disks and 11 x 3.8 TB SSD disks.

    The local HPC system TURPAN is the primary platform for the workshop and will be used for the hands-on exercises. Course accounts will be provided during the workshop to participants without existing accounts. Other systems where up-to-date versions of the tools are installed can also be used when preferred, though support may be limited and participants are expected to already possess user accounts on non-local systems. Regardless of whichever external systems they intend to use, participants should be familiar with the relevant procedures for compiling and running their parallel applications (via batch queues where appropriate).


    Registration via the course website.


    Local Arrangements

    Nicolas Renon
    Email: nicolas.renon[at]

    Tuning Workshop Series

    Cédric Valensi
    Université de Versailles Paris Saclay
    Email: cedric.valensi[at]