Hi,
I had my last atmos_main job in my workflow fail because it ran out of time. But looking at the output it had already completed and wrote data out about 40 mins before it ran out of time. Output can be found in /home/n02/n02/tetts/cylc-run/opt_dfols46_try2/dz02f (which maps to nvme)
ls -ltr /home/n02/n02/tetts/cylc-run/opt_dfols46_try2/dz02f/share/data/History_Data/ gives for the last dump “Apr 29 06:51 dz02fa.da20120301_00”
slurm errors are:
slurmstepd: error: *** STEP 13456128.0 ON nid001091 CANCELLED AT 2026-04-29T07:28:04 DUE TO TIME LIMIT ***
slurmstepd: error: *** JOB 13456128 ON nid001091 CANCELLED AT 2026-04-29T07:28:04 DUE TO TIME LIMIT ***
I manually triggered the postproc which appears to have ran OK. The transfer and housekeeping tasks then ran. Data sizes on jasmin look fine.
Simon
I’ve had another two runs fail due to running out of time:
case dz02i: /home/n02/n02/tetts/cylc-run/opt_dfols46_try2/dz02i/ Failed on 2nd cycle in atmos_main after 2 months and 24 days 
case dz02j: /home/n02/n02/tetts/cylc-run/opt_dfols46_try2/dz02j Failed on first cycle in atmos_main and did not appear to produce any output.
I’ve retriggered both cases and both appear to be running.
Is this a repeat of problems from late 2025/early 2026 when some nodes were “slow”.
Simon
Hi Simon,
It’s worth reporting to the Archer2 helpdesk. Hopefully they can tell if it’s the old problem again.
I have also had a couple of jobs time out in the past couple of days, but I had been changing my setup a bit so wasn’t sure if it was due to that. I have put my config back to normal so if I get any further timeouts I will also report them.
Annette