Virtual Institute — High Productivity Supercomputing

7th Workshop on Extreme-Scale Programming Tools



Friday, November 16, 2018.
08:30 a.m. - 12:00 p.m.


Held in conjunction with SC18: The International Conference for High Performance Computing, Networking, Storage and Analysis
Dallas, TX, USA

In cooperation with:

This workshop is supported by SPPEXA, the DFG Priority Program 1648 Software for Exascale Computing.


The path to extreme computing keeps broadening: large scale systems towards exascale and beyond, growing many core systems with deep memory hierarchies and massively parallel accelerators are just a few of the platforms we can expect. This trend will challenge HPC application developers in their quest to achieve the maximum potential that their systems have to offer, both on and across nodes. Factors such as limited power budgets, heterogeneity, hierarchical memories, shrinking I/O bandwidths, and performance variability will make it increasingly difficult to create productive applications on future platforms. To address these challenges, we need tools for debugging, performance measurement and analysis, and tuning to overcome the architectural, system, and programming complexities expected in these environments.

At the same time, research and development progress for HPC tools themselves faces equally difficult challenges: adaptive systems with an increased emphasis on autotuning, dynamic monitoring and adaptation, heterogeneous analysis and new metrics such as power, energy and temperature require new methodologies, techniques, and engagement with application teams. This workshop will serve as a forum for HPC application developers, system designers and tool researchers to discuss the requirements for tools assisting developers in identifying, investigating and handling the challenges in future extreme scale environments, both for highly parallel nodes and in large-scale HPC systems.

The workshop is the seventh in a series of SC conference workshops organized by the Virtual Institute - High Productivity Supercomputing (VI-HPS), an international initiative of HPC researchers and developers focused on programming and performance tools for parallel systems.

Workshop topics

  • Performance tools for scalable parallel platforms
  • Debugging and correctness tools for parallel programming paradigms
  • Program development tool chains (incl. IDEs) for parallel systems
  • Methodologies for performance engineering
  • Tool technologies for extreme-scale challenges (e.g., scalability, resilience, power)
  • Tool support for accelerated architectures and large-scale multi-cores
  • Measurement and optimization tools for networks and I/O
  • Tool infrastructures and environments
  • Application developer experiences with programming and performance tools

Workshop Proceedings

The ESPT'18 workshop proceedings are published by Springer as part of the LNCS series volume 11027

Workshop Program

08:30 – 08:35 Welcome and introduction
08:35 – 09:20 Keynote
"Understanding software sustainability: Learning from Parsl and other projects"

by Daniel S. Katz (show abstract)  [PDF] This talk will introduce the Parsl ( Python-based parallel scripting library, a tool that can enable highly-productive extreme-scale programming. It will use Parsl as an example to discuss software sustainability, including how different groups think about this concept, what "equations" we might use to measure it, and how it really can't be measured looking forward but only predicted.
09:20 – 09:40 "Understanding the Scalability of Molecular Simulation using Empirical Performance Modeling"
by Sergei Shudler  (show abstract) [PDF]Molecular dynamics (MD) simulation allows for the study of static and dynamic properties of molecular ensembles at various molecular scales, from monatomics to macromolecules such as proteins and nucleic acids. It has applications in biology, materials science, biochemistry, and biophysics. Recent developments in simulation techniques spurred the emergence of the computational molecular engineering (CME) field, which focuses specifically on the needs of industrial users in engineering. Within CME, the simulation code ms2 allows users to calculate thermodynamic properties of bulk fluids. It is a parallel code that aims to scale the temporal range of the simulation while keeping the execution time minimal. In this paper, we use empirical performance modeling to study the impact of simulation parameters on the execution time. Our approach is a systematic workflow that can be used as a blue-print in other fields that aim to scale their simulation codes. We show that the generated models can help users better understand how to scale the simulation with minimal increase in execution time.
09:20 – 09:40 "Advanced Event Sampling Support for PAPI"
by Forrest Smith  (show abstract) [PDF]The PAPI performance library is a widely used tool for gathering performance data from running applications. Modern processors support advanced sampling interfaces, such as Intel's Precise Event Based Sampling (PEBS) and AMD's Instruction Based Sampling (IBS). The current PAPI sampling interface predates the existence of these interfaces and only provides simple instruction-pointer based samples. We propose a new, improved, sampling interface that provides support for the extended sampling information available on modern hardware. We extend the PAPI interface to add a new PAPI_sample_init call that uses the Linux perf_event interface to access the extra sampled information. A pointer to these samples is returned to the user, who can either decode them on the fly, or write them to disk for later analysis. By providing extended sampling information, this new PAPI interface allows advanced performance analysis and optimization that was previously not possible. This will enhance the ability to optimize software in modern extreme-scale programming environments.
10:00 – 10:30 Coffee break
10:30 – 10:50 "PARLOT: Efficient Whole-Program Call Tracing for HPC Applications"
by Saeed Taheri  (show abstract) [PDF]The complexity of HPC software and hardware is quickly increasing. As a consequence, the need for efficient execution tracing to gain insight into HPC application behavior is steadily growing. Unfortunately, available tools either do not produce traces with enough detail or incur large overheads. An efficient tracing method that overcomes the tradeoff between maximum information and minimum overhead is therefore urgently needed. This paper presents such a method and tool, called ParLoT, with the following key features. (1) It describes a technique that makes low-overhead on-the-fly compression of whole-program call traces feasible. (2) It presents a new, highly efficient, incremental trace-compression approach that reduces the trace volume dynamically, which lowers not only the needed bandwidth but also the tracing overhead. (3) It collects all caller/callee relations, call frequencies, call stacks, as well as the full trace of all calls and returns executed by each thread, including in library code. (4) It works on top of existing dynamic binary instrumentation tools, thus requiring neither source-code modifications nor recompilation. (5) It supports program analysis and debugging at the thread, thread-group, and program level. This paper establishes that comparable capabilities are currently unavailable. Our experiments with the NAS parallel benchmarks running on the Comet supercomputer with up to 1,024 cores show that ParLoT can collect whole-program function-call traces at an average tracing bandwidth of just 56 kB/s per core.
10:50 – 11:10 "Gotcha: An Function-Wrapping Interface for HPC Tools"
by David Poliakoff  (show abstract)This paper introduces Gotcha, a function wrapping interface and library for HPC tools. Many HPC tools, and performance analysis tools in particular, rely on function wrapping to integrate with applications. But existing mechanisms, such as LD_PRELOAD on Linux, have limitations that lead to tool instability and complexity. Gotcha addresses the limitations in existing mechanisms, provides a programmable interface for HPC tools to manage function wrapping, and supports function wrapping across multiple tools. In addition, this paper introduces the idea of interface-independent function wrapping, which makes it possible for tools to wrap arbitrary application functions.
11:10 – 11:55 Keynote
"HPC Software Infrastructures at German Aerospace Center"

by Achim Basermann (show abstract) [PDF]In the German Research Foundation (DFG) project ESSEX (Equipping Sparse Solvers for Exascale), we develop scalable sparse eigensolver libraries for large quantum physics problems. Partners in ESSEX are the Universities of Erlangen, Greifswald, Wuppertal, Tokyo and Tsukuba as well as DLR. The project pursues a coherent co-design of all software layers where a holistic performance engineering process guides code development across the classic boundaries of application, numerical method and basic kernel library. The ESSEX Sparse Solver Repository (ESSR) supports application driven fault tolerance and is interoperable with many standard numerical solver libraries. ESSR includes the kernel library GHOST (General, Hybrid, and Optimized Sparse Toolkit) as well as the flexible software framework PHIST (Pipelined Hybrid Iterative Solver Toolkit) for implementing iterative methods on HPC systems. The parallel coupling framework Flow Simulator supports airplane engineering. I will describe structure and features of this framework for massively parallel aeronautic simulation. Furthermore, I will briefly sketch the innovative helicopter simulation framework VAST (Versatile Aeromechanic Simulation Tool. HEAT (Helmholtz Analytics Toolkit) is a recent HPC library development in the Helmholtz project HAF (Helmholtz Analytics Framework). It integrates in particular parallel machine learning methods. Finally, I will give a brief survey on our software developments for quantum computers; this is a joint activity of DLR and NASA Ames.
11:55 – 12:00 Workshop closing remarks

Reproducibility at ESPT 2018

For ESPT 2018, we adopt the model of the SC18 technical paper program. Participation in the reproducibility initiative is optional, but highly encouraged. To participate, authors provide a completed Artifact Description Appendix (at most 2 pages) along with their submission. We will use the format of the SC18 appendix for ESPT submissions (see template).

Note: A paper cannot be disqualified based on information provided or not provided in this appendix, nor if the appendix is not available. The availability and quality of an appendix can be used in ranking a paper. In particular, if two papers are of similar quality, the existence and quality of the appendices can be part of the evaluation process.

For more information, please refer to the SC18 reproducibility page and the FAQs below.

FAQ for authors

Q. Is the Artifact Description appendix required in order to submit a paper to ESPT 2018?
A. No. These appendices are not required. If you do not submit any appendix, it will not disqualify your submission. At the same time, if two papers are otherwise comparable in quality, the existence and quality of appendices can be a factor in ranking one paper over another.

Q. Do I need to make my software open source in order to complete the Artifacts Description appendix?
A. No. It is not required that you make any changes to your computing environment in order to complete the appendix. The Artifacts Description appendix is meant to provide information about the computing environment you used to produce your results, reducing barriers for future replication of your results. However, in order to be eligible for the ACM Artifacts Available badge, your software must be downloadable by anyone without restriction.

Q. Who will review my appendices?
A. The Artifact Description and Computational Results Analysis appendices will be submitted at the same time as your paper and will be reviewed as part of the standard review process by the same reviewers who handle the rest of your paper.

Q. Does the Artifacts Description appendix really impact scientific reproducibility?
A. The Artifacts Description appendix is simply a description of the computing environment used to produce the results in a paper. By itself, this appendix does not directly improve scientific reproducibility. However, if this artifact is done well, it can be used by scientists (including the authors at a later date) to more easily replicate and build upon the results in the paper. Therefore, the Artifacts Description appendix can reduce barriers and costs of replicating published results. It is an important first step toward full scientific reproducibility.

Organizing committee

Martin Schulz, Technical University Munich, Germany
David Böhme, Lawrence Livermore National Laboratory, USA
Marc-André Hermanns, Forschungszentrum Jülich, Germany
William Jalby, Université de Versailles St-Quentin-en-Yvelines, France
Felix Wolf, Technische Universität Darmstadt, Germany


You can reach the organizing committee via mail at:

Program committee

Dorian C. Arnold, Emory University, USA
Jean-Baptiste Besnard, ParaTools, France
David Böhme, Lawrence Livermore National Laboratory, USA
Karl Fürlinger, Ludwig Maximilian University of Munich, Germany
Michael Gerndt, Technical University Munich, Germany
Judit Gimenez, Barcelona Supercomputing Center
Marc-André Hermanns, Forschungszentrum Jülich, Germany
Kevin Huck, University of Oregon, USA
William Jalby, Université de Versailles St-Quentin-en-Yvelines, France
Andreas Knüpfer, Technical University Dresden,Germany
John Linford, ARM, USA
Allen D. Malony, University of Oregon, USA
John Mellor-Crummey, Rice University, USA
Bart Miller, University of Wisconsin Madison, USA
Heidi Poxon, Cray Inc., USA
Martin Schulz, Technical University Munich, Germany
Nathan Tallent, Pacific Northwestern National Laboratory, USA
Christian Terboven, RWTH Aachen University, Germany
Josef Weidendorfer, Leibniz Supercomputing Centre, Germany
Gerhard Wellein, University of Erlangen-Nürnberg, Germany
Felix Wolf, Technical University Darmstadt, Germany
Brian J.N. Wylie, Forschungszentrum Jülich, Germany

Previous workshops