Failure from atmos_main

SimonTett · 29 April 2026 15:53

Hi,

I had my last atmos_main job in my workflow fail because it ran out of time. But looking at the output it had already completed and wrote data out about 40 mins before it ran out of time. Output can be found in /home/n02/n02/tetts/cylc-run/opt_dfols46_try2/dz02f (which maps to nvme)

ls -ltr /home/n02/n02/tetts/cylc-run/opt_dfols46_try2/dz02f/share/data/History_Data/ gives for the last dump “Apr 29 06:51 dz02fa.da20120301_00”

slurm errors are:


slurmstepd: error: *** STEP 13456128.0 ON nid001091 CANCELLED AT 2026-04-29T07:28:04 DUE TO TIME LIMIT ***
slurmstepd: error: *** JOB 13456128 ON nid001091 CANCELLED AT 2026-04-29T07:28:04 DUE TO TIME LIMIT ***

I manually triggered the postproc which appears to have ran OK. The transfer and housekeeping tasks then ran. Data sizes on jasmin look fine.

Simon

SimonTett · 30 April 2026 08:14

I’ve had another two runs fail due to running out of time:

case dz02i: /home/n02/n02/tetts/cylc-run/opt_dfols46_try2/dz02i/ Failed on 2nd cycle in atmos_main after 2 months and 24 days

case dz02j: /home/n02/n02/tetts/cylc-run/opt_dfols46_try2/dz02j Failed on first cycle in atmos_main and did not appear to produce any output.

I’ve retriggered both cases and both appear to be running.

Is this a repeat of problems from late 2025/early 2026 when some nodes were “slow”.

Simon

AnnetteOsprey · 30 April 2026 08:31

Hi Simon,

It’s worth reporting to the Archer2 helpdesk. Hopefully they can tell if it’s the old problem again.

I have also had a couple of jobs time out in the past couple of days, but I had been changing my setup a bit so wasn’t sure if it was due to that. I have put my config back to normal so if I get any further timeouts I will also report them.

Annette

Topic		Replies	Views
Jobs failing on ARCHER2 Unified Model ARCHER2	4	320	27 November 2023
Cancelled due to time limit Unified Model	4	60	4 October 2024
ARCHER2 job cancelled due to time limit? Rose/Cylc and FCM PUMA , ARCHER2	4	420	23 February 2022
Model failures Unified Model ARCHER2	14	143	4 November 2025
Multiple failures from save_wallclock.sh Unified Model ARCHER2	3	8	25 February 2026

Failure from atmos_main

Related topics