Callgrind

OrganizationTechnische Universität München
DescriptionCallgrind is an open-source profiler using execution-driven cache simulation via dynamic runtime instrumentation (provided by the open-source project Valgrind). This way, it can directly analyze cache behavior of program binaries for x86, x86-64, PPC32/64 and ARM. It comes with the visualization GUI KCachegrind, which provides various views of performance data such as annotated call graphs and tree maps for call nesting, as well as annotation of source and machine code.
Programming modelsAgnostic to programming models (working on the binary level). Mostly useful for single-node performance, but works also with PThreads/OpenMP and MPI.
LicenseOpen source: GPL
URLhttp://kcachegrind.sf.net


MAQAO

OrganizationLRC ITACA / Université de Versailles St-Quentin-en-Yvelines
DescriptionThe MAQAO (Modular Assembly Code Quality Analyzer and Optimizer) framework is composed of four main parts. Firstly, the disassembler/patcher provides binary instrumentation for loops, memory access profiling and fine-grained analysis of memory accesses. Secondly, the static analysis engine reconstructs the call graph, control-flow graph, dominance tree and loop nest information. Thirdly, the static performance models of processor front-end and back-end predict L1 performance and compute the critical path. Fourthly, the L2/L3/RAM performance is predicted based on micro-benchmarking and pattern matching.
Programming modelsAgnostic to programming models (working on the binary level). Mostly useful for single-node performance, but works also with PThreads/OpenMP and MPI.
LicenseOpen source: GPLv3 (planned)
URLhttp://maqao.org/


MUST

OrganizationTechnische Universität Dresden
DescriptionMUST is a runtime error detection tool for MPI applications. It detects usage errors of the MPI at runtime and reports them to the user. It is easy to use and provides a wide range of correctness checks, including MPI deadlock detection, resource leak detection, datatype matching, and detection of communication buffer overlaps. MUST provides its correctness reports in an HTML report that is even available if the application crashes due to an MPI usage error. It is the successor of the Marmot (Universität Suttgart and Technische Universität Dresden) and Umpire (Lawrence Livermore National Laboratory) tools and combines their features.
Programming modelsMPI
LicenseOpen source: BSD
URLhttp://tu-dresden.de/zih/must (Release November 2011)


Memchecker

OrganizationHigh Performance Computing Center Stuttgart
DescriptionMemchecker is part of the Open MPI implementation and is based on valgrind. It allows finding hard-to-catch memory errors in MPI applications: memory errors such as overwriting of memory regions used in non-blocking communication, one-sided communication, as well as MPI-structures passed into the MPI implementation are checked, as well as structures passed out of the MPI library.
Programming modelsOpen MPI
LicenseOpen source: BSD
URLhttp://www.open-mpi.de


ompP

OrganizationUniversity of Tennessee
DescriptionompP is a profiling tool for OpenMP applications written in C/C++ or FORTRAN. ompP works with most UNIX-like operating systems and OpenMP compilers. The profiling report of ompP becomes available immediately after program termination in a human-readable ASCII text format. ompP supports the measurement of hardware performance counters using PAPI and contains several advanced productivity features such as overhead analysis and detection of common inefficiency situations (performance properties).
Programming modelsOpenMP
LicenseGNU General Public License (GPL v.2)
URLhttp://www.ompp-tool.com


OPARI2

OrganizationSILC Partners
DescriptionOPARI2, the successor of Forschungszentrum Jülich's OPARI, is a source-to-source instrumentation tool for OpenMP and hybrid codes. It surrounds OpenMP directives and runtime library calls with calls to the POMP2 measurement interface. The POMP2 interface can be implemented by tool builders who want, for example, to monitor the performance of OpenMP applications. Like its predecessor, OPARI2 works with Fortran, C, and C++ programs. Additional features compared to OPARI are a new initialization method that allows for multi-directory and parallel builds as well as the usage of pre-instrumented libraries. Furthermore, an efficient way of tracking parent-child thread-relationships was added. Additionally OPARI2 was extended to support instrumentation of OpenMP 3.0 tied tasks.
OPARI is used by many performance analysis tools (e.g. TAU, ompP, KOJAK, Scalasca, VampirTrace) whereas OPARI2 is currently used by Score-P and TAU.
Programming modelsOpenMP 3.0
LicenseOpen source: BSD
URLhttp://www.vi-hps.org/projects/score-p#opari2


Open Trace Format 2 (OTF2)

OrganizationSILC Partners
Description The Open Trace Format 2 (OTF2), successor of Technische Universität Dresden's OTF, is an open source API and library for reading and writing of event traces. It is designed to be highly scalable and memory efficient. For massively parallel environments it supports in-memory analysis as well as writing and reading traces via the scalable I/O library SIONlib. OTF2 will become the new standard trace format for Scalasca, Vampir, Tau, and Score-P and is open for other tools.
Programming modelsgeneral/all
LicenseOpen source: BSD
URLhttp://www.vi-hps.org/projects/score-p#otf2


PAPI

OrganizationUniversity of Tennessee
DescriptionPAPI is a cross-platform interface to the hardware performance counters available on most modern microprocessors. In addition to defining a standard set of routines for configuring and accessing the counters, PAPI defines a common set of performance events considered most useful for application performance tuning. These events include operation and cycle counts, cache and memory access events, and branch behavior events. Most recently, PAPI has been extended to PAPI-C (component PAPI), which provides simultaneous access to multiple counter domains, including the previous on-processor counters as well as off-processor counters and sensors such as network counters and temperature sensors.
Programming modelsFortran and C calling interfaces
LicenseOpen source: New BSD
URLhttp://icl.cs.utk.edu/papi/


Periscope

OrganizationTechnische Universität München
DescriptionPeriscope is a distributed automatic on-line performance analysis system for large-scale parallel systems. It consists of a frontend and a hierarchy of communication and analysis agents. Each of the analysis agents searches autonomously for inefficiencies in a subset of the application processes. Using a convenient graphical user interface, users can start up the analysis process and inspect the resulting performance data. The GUI is developed as a plug-in for Eclipse so that the developer can also take advantage of other available programming tools within the IDE.
Programming modelsMPI and OpenMP
LicenseOpen source: BSD
URLhttp://www.lrr.in.tum.de/periscope/


Scalasca

OrganizationForschungszentrum Jülich and
German Research School for Simulation Sciences
DescriptionScalasca is an open-source toolset that can be used to analyze the performance behavior of parallel applications and to identify opportunities for optimization. It has been specifically designed for use on large-scale systems including IBM Blue Gene and Cray XT, but is also well-suited for small- and medium-scale HPC platforms. Scalasca integrates runtime summaries with in-depth studies of concurrent behavior via event tracing. A distinctive feature is the ability to identify wait states that occur, for example, as a result of unevenly distributed workloads.
Programming modelsMPI and OpenMP
LicenseOpen source: New BSD
URLhttp://www.scalasca.org


Score-P

OrganizationSILC Partners
Description The Score-P measurement infrastructure is a highly scalable and easy-to-use tool suite for profiling, event trace recording, and online analysis of HPC applications.
Score-P offers the user a maximum of convenience by supporting a number of analysis tools. Currently, it works with Periscope, Scalasca, Vampir, and Tau and is open for other tools. Score-P comes together with the new Open Trace Format Version 2, the CUBE4 profiling format and the Opari2 instrumenter.
Programming modelsSerial, OpenMP, MPI, and hybrid (MPI+OpenMP)
LicenseOpen source: BSD
URLhttp://www.score-p.org


TAU

OrganizationUniversity of Oregon
DescriptionTAU is an integrated parallel performance framework for the instrumentation, measurement, analysis, and visualization of large-scale parallel computer systems and applications. It provides a flexible, robust, and portable tools platform that supports profiling and tracing for performance parallel evaluation across all leading programming models and environments.
Programming models C, C++, Fortran, Java, Python, MPI, OpenMP
License Open source: New BSD
URL http://tau.uoregon.edu


VAMPIR

OrganizationTechnische Universität Dresden
DescriptionThe VAMPIR software tool provides an easy-to-use framework that enables developers to quickly display and analyze arbitrary program behavior at any level of detail. The tool suite implements optimized event analysis algorithms and customizable displays that enable fast and interactive rendering of very complex performance monitoring data.
Programming modelsMPI and OpenMP
LicenseCommercial
URLhttp://www.vampir.eu


VampirTrace

OrganizationTechnische Universität Dresden
DescriptionDuring a program run of an application, VampirTrace generates a trace file, which can be analyzed and visualized by the visualization tool Vampir. The VampirTrace library allows MPI communication events of a parallel program to be recorded as a trace file. Additionally, certain program-specific events can be included. VampirTrace was derived from the KOJAK trace library EPILOG.
Programming modelsMPI and OpenMP
LicenseOpen source: BSD
URLhttp://www.tu-dresden.de/zih/vampirtrace