Copyright 2011-2013 German Research School for Simulation Sciences GmbH Contact: Aamer Shah Usage: ------ LWM2 (Light Weight Measurement Module) is a light weight profiler designed to measure performance of applications with minimum user effort. It can simple be used by preloading the LWM2 library while starting an application. The LD_PRELOAD variable has to be set to the LWM2 library to preload it. For MPI applications, this can be done by setting the variable while calling mpiexec/mpirun E.g: mpiexec -E LD_PRELOAD= -np Different MPI implementations have different methods of passing the environment variables, but they are quite similar to the example above. LWM2 Output: ------------ LWM2 will present a summary of the performance of the application on the standard output at the end of application execution. Along with the summary, LWM2 will write the performance data into a set of files. The base name of the file will be lwm2.j.t. Three different files will be written by LWM2. They are - Definition file: It will be named .definition. It will contain the basic information about the job and the gathered job data. This file will be in plain text. - Digest file: The digest file will be named .digest. It will contain the performance metrics for the whole execution of the application. The metrics will be stored for each process separately. - Slices file: The slices file will be named .slices. It will contain the performance metrics which are collected per time slice. The metrics will be stored, per process and per time slice. The files will be written in the same directory which will be active (working directory) when the application is executed. + The job id is defined by an environment variable, which is named differently by different batch systems. The environment variable LWM2_JOBID_VAR should be set to the varibale storing the job id. Runtime options: ---------------- Some output from the LWM2 is not produced by default, and has to be enabled by setting an environment variable. - The summary output on the standard output is not produced, unless there is defined an environment variable with the name LWM2_CONSOLE_SUMMARY. - The slices file is not produced by default unless the environment variable LWM2_WRITE_FILE is set to TIME_SLICES. - LWM2 can be configured not to write any file, by setting environment variable LWM2_WRITE_FILE to NO. - The default base name of the output file can be modified by setting the environment variable LWM2_OUTPUT_FILENAME to the new base file name. - The default directory where the output file is written can be modified by setting the environment variable LWM2_OUTPUT_DIR When an MPI job is started, the runtime utilities schedule and distribute the binary file onto the nodes. When profiling with LD_PRELOAD, these utilities also get profilied as separate applications. To avoid this, a file can be created with a list of binaries which should be ignored by LWM2. The environment variable LWM2_IGNORE_LIST should be used to specify the complete path to the file containing this list. A limited set of hardware counters can be read on a system at a time. LWM2 offers runtime option to select the appropriate set of hardware counters a user wishes for a run. This can be done by setting the environment variable LWM2_HWC_CONFIG to a proper value. The values can be: - LWM2_HWC_CPU_ONLY: Only profile hardware counters related to CPU - LWM2_HWC_MEM_ONLY: Only profile hardware counters related to cache - LWM2_HWC_ALL: Profile both CPU and cache related counters On some machines, cache counters are not available. In that case, the ratio is calculated using the load/store counters. The options in that case will be: - LWM2_HWC_MEM_ALT: Profile using load/store hardware counters - LWM2_HWC_ALL_ALT: Profile both CPU and memory, but use load/store hardware counters Some machines don't provide counters for floating point instructions. In that case, the following alternate options can be used: - LWM2_HWC_CPU_BASIC: No information about floating point instructions. - LWM2_HWC_ALL_BASIC: Profile both CPU and memory, without floating point instructions - LWM2_HWC_ALL_ALT_BASIC: Profile both CPU and memory, but without floating point instructions and using load/store for memory. Individual metrics can be measured through the following options: - LWM2_HWC_CPI - LWM2_HWC_FLOPS - LWM2_HWC_FLOINS - LWM2_HWC_L1_RATIO - LWM2_HWC_LOAD_STORE_RATIO l2freader: ---------- The l2freader is a small utility for reading files generated by LWM2. Ideally, LWM2 should be setup on a system with a front-end, hence not requiring the usage of l2freader. Otherwise, the reader can be used to get the content of the files. The utility lists different metrics stored in the files. The metrics are stored for each process. When time-slice files are written, some metrics are also written per time-slice for each process. The general usage is as follows: ./l2freader [--options] Executing the reader without any filename and option will print a small help text listing all the options supported. The options are as follows: --list-sensors :list metrics in the files --file-def :print the metadata in the files --summary :print a summary of the metrics --proc-metric-tslice #proc #metric :print the specified metric for the specified process, per time-slice --proc-metric-summary #proc #metric :print the specified metric for the specified process --metric-tslice #metric :print the specified metric for all processes, per time-slice --metric-summary #metric :print an averaged value for the specified metric --metric-proc #metric :print an aggregated value for the specified metric, per process --proc-summary #proc :print a summary of all metrics for the specified process --allproc-metric-tslice #metric :print the specified metric for all processes, per time-slice --time-slices :print all the metrics for all processes, for all time-slices