| Callgrind | ||
| Organization | Technische Universität München | |
| Description | Callgrind is an open-source profiler using execution-driven cache simulation via dynamic runtime instrumentation (provided by the open-source project Valgrind). This way, it can directly analyze cache behavior of program binaries for x86, x86-64, PPC32/64 and ARM. It comes with the visualization GUI KCachegrind, which provides various views of performance data such as annotated call graphs and tree maps for call nesting, as well as annotation of source and machine code. | |
| Programming models | Agnostic to programming models (working on the binary level). Mostly useful for single-node performance, but works also with PThreads/OpenMP and MPI. | |
| License | Open source: GPL | |
| URL | ||
| MAQAO | |
| Organization | LRC ITACA / Université de Versailles St-Quentin-en-Yvelines |
| Description | The MAQAO (Modular Assembly Code Quality Analyzer and Optimizer) framework is composed of four main parts. Firstly, the disassembler/patcher provides binary instrumentation for loops, memory access profiling and fine-grained analysis of memory accesses. Secondly, the static analysis engine reconstructs the call graph, control-flow graph, dominance tree and loop nest information. Thirdly, the static performance models of processor front-end and back-end predict L1 performance and compute the critical path. Fourthly, the L2/L3/RAM performance is predicted based on micro-benchmarking and pattern matching. |
| Programming models | Agnostic to programming models (working on the binary level). Mostly useful for single-node performance, but works also with PThreads/OpenMP and MPI. |
| License | Open source: GPLv3 (planned) |
| URL | |
| MUST | ||
| Organization | Technische Universität Dresden | |
| Description | MUST is a runtime error detection tool for MPI applications. It detects usage errors of the MPI at runtime and reports them to the user. It is easy to use and provides a wide range of correctness checks, including MPI deadlock detection, resource leak detection, datatype matching, and detection of communication buffer overlaps. MUST provides its correctness reports in an HTML report that is even available if the application crashes due to an MPI usage error. It is the successor of the Marmot (Universität Suttgart and Technische Universität Dresden) and Umpire (Lawrence Livermore National Laboratory) tools and combines their features. | |
| Programming models | MPI | |
| License | Open source: BSD | |
| URL | ||
| Memchecker | ||
| Organization | High Performance Computing Center Stuttgart | |
| Description | Memchecker is part of the Open MPI implementation and is based on valgrind. It allows finding hard-to-catch memory errors in MPI applications: memory errors such as overwriting of memory regions used in non-blocking communication, one-sided communication, as well as MPI-structures passed into the MPI implementation are checked, as well as structures passed out of the MPI library. | |
| Programming models | Open MPI | |
| License | Open source: BSD | |
| URL | ||
| ompP | ||
| Organization | University of Tennessee | |
| Description | ompP is a profiling tool for OpenMP applications written in C/C++ or FORTRAN. ompP works with most UNIX-like operating systems and OpenMP compilers. The profiling report of ompP becomes available immediately after program termination in a human-readable ASCII text format. ompP supports the measurement of hardware performance counters using PAPI and contains several advanced productivity features such as overhead analysis and detection of common inefficiency situations (performance properties). | |
| Programming models | OpenMP | |
| License | GNU General Public License (GPL v.2) | |
| URL | ||
| OPARI2 | |
| Organization | SILC Partners |
| Description | OPARI2, the successor of Forschungszentrum Jülich's OPARI, is a source-to-source instrumentation tool for OpenMP and hybrid codes. It surrounds OpenMP directives and runtime library calls with calls to the POMP2 measurement interface. The POMP2 interface can be implemented by tool builders who want, for example, to monitor the performance of OpenMP applications. Like its predecessor, OPARI2 works with Fortran, C, and C++ programs. Additional features compared to OPARI are a new initialization method that allows for multi-directory and parallel builds as well as the usage of pre-instrumented libraries. Furthermore, an efficient way of tracking parent-child thread-relationships was added. Additionally OPARI2 was extended to support instrumentation of OpenMP 3.0 tied tasks.
OPARI is used by many performance analysis tools (e.g. TAU, ompP, KOJAK, Scalasca, VampirTrace) whereas OPARI2 is currently used by Score-P and TAU. |
| Programming models | OpenMP 3.0 |
| License | Open source: BSD |
| URL | |
| Open Trace Format 2 (OTF2) | |
| Organization | SILC Partners |
| Description | The Open Trace Format 2 (OTF2), successor of Technische Universität Dresden's OTF, is an open source API and library for reading and writing of event traces. It is designed to be highly scalable and memory efficient. For massively parallel environments it supports in-memory analysis as well as writing and reading traces via the scalable I/O library SIONlib. OTF2 will become the new standard trace format for Scalasca, Vampir, Tau, and Score-P and is open for other tools. |
| Programming models | general/all |
| License | Open source: BSD |
| URL | |
| PAPI | ||
| Organization | University of Tennessee | |
| Description | PAPI is a cross-platform interface to the hardware performance counters available on most modern microprocessors. In addition to defining a standard set of routines for configuring and accessing the counters, PAPI defines a common set of performance events considered most useful for application performance tuning. These events include operation and cycle counts, cache and memory access events, and branch behavior events. Most recently, PAPI has been extended to PAPI-C (component PAPI), which provides simultaneous access to multiple counter domains, including the previous on-processor counters as well as off-processor counters and sensors such as network counters and temperature sensors. | |
| Programming models | Fortran and C calling interfaces | |
| License | Open source: New BSD | |
| URL | ||
| Periscope | ||
| Organization | Technische Universität München | |
| Description | Periscope is a distributed automatic on-line performance analysis system for large-scale parallel systems. It consists of a frontend and a hierarchy of communication and analysis agents. Each of the analysis agents searches autonomously for inefficiencies in a subset of the application processes. Using a convenient graphical user interface, users can start up the analysis process and inspect the resulting performance data. The GUI is developed as a plug-in for Eclipse so that the developer can also take advantage of other available programming tools within the IDE. | |
| Programming models | MPI and OpenMP | |
| License | Open source: BSD | |
| URL | ||
| Scalasca | ||
| Organization | Forschungszentrum Jülich and German Research School for Simulation Sciences | |
| Description | Scalasca is an open-source toolset that can be used to analyze the performance behavior of parallel applications and to identify opportunities for optimization. It has been specifically designed for use on large-scale systems including IBM Blue Gene and Cray XT, but is also well-suited for small- and medium-scale HPC platforms. Scalasca integrates runtime summaries with in-depth studies of concurrent behavior via event tracing. A distinctive feature is the ability to identify wait states that occur, for example, as a result of unevenly distributed workloads. | |
| Programming models | MPI and OpenMP | |
| License | Open source: New BSD | |
| URL | ||
| Score-P | |
| Organization | SILC Partners |
| Description |
The Score-P measurement infrastructure is a highly scalable and easy-to-use tool suite for profiling, event trace recording, and online analysis of HPC applications. Score-P offers the user a maximum of convenience by supporting a number of analysis tools. Currently, it works with Periscope, Scalasca, Vampir, and Tau and is open for other tools. Score-P comes together with the new Open Trace Format Version 2, the CUBE4 profiling format and the Opari2 instrumenter. |
| Programming models | Serial, OpenMP, MPI, and hybrid (MPI+OpenMP) |
| License | Open source: BSD |
| URL | |
| TAU | ||
| Organization | University of Oregon | |
| Description | TAU is an integrated parallel performance framework for the instrumentation, measurement, analysis, and visualization of large-scale parallel computer systems and applications. It provides a flexible, robust, and portable tools platform that supports profiling and tracing for performance parallel evaluation across all leading programming models and environments. | |
| Programming models | C, C++, Fortran, Java, Python, MPI, OpenMP | |
| License | Open source: New BSD | |
| URL |
|
|
| VAMPIR | ||
| Organization | Technische Universität Dresden | |
| Description | The VAMPIR software tool provides an easy-to-use framework that enables developers to quickly display and analyze arbitrary program behavior at any level of detail. The tool suite implements optimized event analysis algorithms and customizable displays that enable fast and interactive rendering of very complex performance monitoring data. | |
| Programming models | MPI and OpenMP | |
| License | Commercial | |
| URL | ||
| VampirTrace | |
| Organization | Technische Universität Dresden |
| Description | During a program run of an application, VampirTrace generates a trace file, which can be analyzed and visualized by the visualization tool Vampir. The VampirTrace library allows MPI communication events of a parallel program to be recorded as a trace file. Additionally, certain program-specific events can be included. VampirTrace was derived from the KOJAK trace library EPILOG. |
| Programming models | MPI and OpenMP |
| License | Open source: BSD |
| URL | |