STAT - The Stack Trace Analysis Tools
STAT is a lightweight and highly scalable debugging tool for identifying errors in code running at full scale. It has been developed in close collaboration between LLNL, the University of Wisconsin, and the University of New Mexico, and works on the principle of detecting and grouping similar processes at suspicious points in a programs execution. This allows users to reduce the problem they are trying to debug to only a small and tractable number of processes by picking representatives from each group instead of having to debug all processes at the same time. It also automatically identifies outliers, processes that cannot be grouped and/or that behave substantially different. This is often an indication of an erroneous execution and STAT can aid in quickly identifying such anomalies. STAT achieves this grouping of processes by examining the state of all processes in a parallel program dynamically at runtime and by extracting stack traces, the calling sequence of functions that lead to the current point of execution. Within VI-HPS, LLNL is the main contact for STAT.
STAT is programming model independent, but (in its current setup) requires the MPIR process acquisition interface to be present in the target system.
Open source: BSD
LLNL, Universities of Wisconsin and New Mexico