Problems extending UKESM runs beyond original runlength

andreadittus · 27 March 2026 10:02

Hi CMS team,

I’m having trouble with extending suites u-dr800 and u-dr928 beyond their original run length which completed successfully.

I’ve changed RUNLEN in rose-suite.conf and re-installed with cylc vr, but somehow for u-dr928 that doesn’t start the new tasks. How can I make sure the new tasks get added in cylc?

In both runs, the next atmospheric restart file was missing, but I’ve found it in the cycle directories and copied to /share/data/History_Data.

For u-dr800, the next coupled task fails with:

===>>> : E R R O R
===========

stpctl: the zonal velocity is larger than 20 m/s

kt=691204 max abs(U): 905.5 , i j k: 240 92 1

       output of last fields in numwso

stp_ctl:tracer anomaly: ***** WARNING *****
stp_ctl:tracer anomaly: sea surface temperature < -10C
stp_ctl:tracer anomaly: kt=691204 min SST: -1523.5946551009, i j: 241 92
stp_ctl:tracer anomaly: ***** END OF WARNING *****

===>>> : E R R O R
===========

While I know how we normally deal with this, I’m puzzled as I’ve previously had this same error with a run that had reached the end of its original RUNLEN. I also found this ticket where the error occurs after extending the run Suite fails on restart after extending length, so I wonder if there is a pattern/ issue with extending the runlen.

Any advice would be very appreciated!

Thanks,
Andrea

jonnyhtw · 30 March 2026 10:39

Hi Andrea.

I’ll take a look at this and will get back to you.

Jonny

jonnyhtw · 30 March 2026 11:33

Hi again.

I think the reason u-dr928 is shutting down is that the scheduler was waiting for some input on what to do next and it timed out after the prescribed 5 minutes. Here’s end of the latest scheduler log (on PUMA2, different to the job log on ARCHER2).

> tail -7  /home/n02/n02/adittus/cylc-run/u-dr928/runN/log/scheduler/14-restart-14.log
2026-03-27T11:24:49Z INFO - RESUMING the workflow now
2026-03-27T11:24:49Z INFO - Command "resume" actioned. ID=e2d6e363-05ec-4763-950e-96101626199c
2026-03-27T11:24:52Z INFO - Command "pause" received. ID=8ba5ba32-4868-4a4d-a826-a733af23390c
    pause()
2026-03-27T11:24:52Z INFO - Pausing the workflow
2026-03-27T11:24:52Z INFO - Command "pause" actioned. ID=8ba5ba32-4868-4a4d-a826-a733af23390c
2026-03-27T11:29:31Z WARNING - restart timer timed out after PT5M

If you manage to restart u-dr928 and you get the same NEMO error then I’d say that’s indicative of the failure being due to something going wrong in the initial conditions.

Cheers

Jonny

jonnyhtw · 30 March 2026 11:35

… also note that workflows which have completed (rather than extending a running one) need to be restarted, not reloaded. This is essentially the difference between cylc vr and cylc play.

Cheers

Jonny

andreadittus · 31 March 2026 11:11

Hi Jonny,

I think I’ve worked out why there is often a failure / instability after a restart.
In my case, for u-dr800, I think the restart file is the issue as you suspected.

I think this is only an issue for runs with UKESM that have interactive icesheets. I think it happens when postproc_atmos and process_icecap_for_um get out of sync or mixed up.

process_icecap_for_um has in its description: Modify the UM dump with info from the icesheet run

postproc_atmos moves the dump between the share/History_Data and cycle directories.

In my case, I know process_icecap_for_um failed initially because the dump was missing, but I may have not been careful enough with the order and something got messed up when I triggered some tasks. I thought finding the original dump and then running the process_icecap_for_um task would fix it. I haven’t managed to reverse engineer the correct restart file unfortunately, but I think that is the most likely explanation.

Hoping this helps shed some light on restart issues. Would be interesting to know if the suite linked in the other ticket has interactive ice sheets switched on - I think the default setting for UKESM1.0-LL is no interactive ice sheets.

Cheers,
Andrea

jonnyhtw · 1 April 2026 13:43

Hi Andrea.

Yes you’re right, u-dm204 from the ticket mentioned earlier did not have ice sheets turned on.

Therefore, assuming your model failure is due to ice sheet-related changes, then it looks to me like the two instances of the zonal velocity error are unrelated although I can’t guarantee this.

Assuming your simulations aren’t too far into their run time, then you might be better off just restarting them from scratch. Another alternative would be to rerun from the beginning of the most recently completed year.

All the best.

Jonny

Topic		Replies	Views
Suite fails on restart after extending length Unified Model ARCHER2	3	71	23 January 2025
UKESM run stopped without errors ARCHER2	7	123	4 November 2024
Cycle point for restarting suites? Unified Model ARCHER2 , PUMATest	20	912	1 March 2022
Missing restart dump at start of new cycle Unified Model	4	74	18 June 2025
Suite fails to restart Rose/Cylc and FCM ARCHER2	4	340	11 January 2023

Problems extending UKESM runs beyond original runlength

stpctl: the zonal velocity is larger than 20 m/s

Related topics