Callgrind is an open-source profiler using execution-driven cache simulation via dynamic runtime instrumentation (provided by the open-source project Valgrind). This way, it can directly analyze cache behavior of program binaries for x86, x86-64, PPC32/64 and ARM. It comes with the visualization GUI KCachegrind, which provides various views of performance data such as annotated call graphs and tree maps for call nesting, as well as annotation of source and machine code.
Agnostic to programming models (working on the binary level). Mostly useful for single-node performance, but works also with PThreads/OpenMP and MPI.
Open source: GPL
Technische Universität München