I’m trying to measure the time spent for the UM model I’m running in ARCHER2. I have by now information from three different sources: file bo026.fort6.pe.stdout, file log.status and running sacct on ARCHER2.
The log.status and the output from sacct are consistent between them and to my instintive reality: around two minutes.
File bo026.fort6.pe.stdout has the following information:
PE 0 Elapsed CPU Time: 94.004 seconds
PE 0 Elapsed Wallclock Time: 111.087 seconds
Total Elapsed CPU Time: 26666.898 seconds
Maximum Elapsed Wallclock Time: 111.087 seconds
The PE 0 Elapsed Wallclock Time seems roughly what I’m looking for. I’m not sure what the extra information means and if that’s important or not.
Finally, after this information, the file bo026.fort6.pe.stdout has something that resembles the time each routine has spent. In my case, there are 48 routines, but when I add them up, the result is three times PE 0 Elapsed Wallclock Time which, in my mind, should represent the total time.
What am I missing and what’s the best (closer to real-time) way to measure this information?
The timings for each subroutine are inclusive timers. This is the total time spent in the subroutine including the subroutines that are called from it. For example; the time for U_MODEL_4A includes the time spent in Atm_Step, etc.
I’m not aware that the UM timers can do that. Usually people wish to know how much time a particular section takes; for example, how much of the total time is spent writing the dumps (DUMPCTL), or running the convection or doing the initialisation, etc.
These are 3 separate timers and so will vary slightly. sacct is the scheduler timer, job.status is from cylc and PE0 Elapsed Wallclock is the UM internal timer. Obviously the scheduler and cylc timings will include a little startup time before the UM executable starts running and thus UM timers intialise.