AutomaDeD (Automata-based Debugging for Dissimilar Parallel Tasks) is a tool that detects errors and anomalies in MPI applications based on the (dis)similarity between tasks. The tool summarizes tasks' runtime behavior using statistical models that capture the control paths and the time spent in each control block of an MPI application. It has been developed in close collaboration between LLNL and Purdue University. Within VI-HPS, LLNL is the main contact for this tool. AutomaDeD suggests possible root causes of detected errors by probabilistically identifying an anomalous MPI task and the code region in which an anomaly occurred. A key mechanism is scalable outlier detection that uses distributed sampling techniques to find the anomalous task(s) with low overhead. The tool is able to build a progress-dependency graph that allows programmers identify the least-progressed task which is often associated to the origin of correctness problems such as hangs and deadlocks.
Open source: released soon
LLNL and Purdue University