/work/n02/n02/earbsn/cylc-run/ka414_high_melt_grn/run3
Hi,
My suite has failed 50 years in with a postproc_nemo_grid error. The simulation keeps running for multiple years after and then fails at the coupled step.
I have been trying to fix the postproc_nemo_grid error before letting the simulation run any further, but I’m finding the errors to be very unhelpful.
[WARN] file:atmospp.nl: skip missing optional source: namelist:moose_arch
[WARN] file:nemocicepp.nl: skip missing optional source: namelist:moose_arch
[WARN] file:atmospp.nl: skip missing optional source: namelist:script_arch
[WARN] file:nemocicepp.nl: skip missing optional source: namelist:script_arch
slurmstepd: error: *** JOB 13299737 ON dvn01 CANCELLED AT 2026-04-16T10:41:22 DUE TO TIME LIMIT ***
2026-04-16T09:41:23Z CRITICAL - failed/TERM
I normally just get a time limit error, as pasted above from job.err. I have tried increasing the wall clock time and the ocean timesteps to different combinations, but this has not changed anything. The task seems to run almost to completion before failing.
I have tried copying the suite and running it from the same timestep that it’s currently failing on, and it still fails in the same way.
Do you have any ideas of what I can try or where I can look for more information? I feel quite stuck.
Thank you so much,
Brooke
Hi Brooke,
All the attempts I can see have only had 2 hours wallclock…
I’m looking under /home/n02/n02/earbsn/cylc-run/ka414_high_melt_grn/run3/log/job/21560101T0000Z/postproc_nemo_grid
The problem is when the script is terminated the buffer isn’t flushed so it’s difficult to see where it got to.
Can you point me to a run that was given more than 2 hours?
Cheers,
Ros
Hi Ros,
Thanks for getting back to me.
hm… sorry for my ignorance, but is there a different wall clock for the post processing steps than the one in the suite conf?
I’m pretty sure I had increased the wallclock up to 4 hours, but maybe I’m changing the wrong thing?
I had also tried this in /home/n02/n02/earbsn/cylc-run/ka414_high_melt_grn2/run1
Thanks,
Brooke
Hi Brooke,
Yes CLOCK=4,0,0 usually is just the walltime for the model. Each task can be given a different timelimit.
You’re looking for a line like:
execution time limit = PT2H
in the flow.cylc or site/archer2.cylc file, likely in one of the families inherited by the postproc_nemo_grid task. If you point me to the source rose suite for this run I’ll take a look.
You can see what timelimit has been set for the task by looking in the task job script: e.g. /home/n02/n02/earbsn/cylc-run/ka414_high_melt_grn/run3/log/job/21560101T0000Z/postproc_nemo_grid/07/job
#SBATCH --time=120:00
Cheers,
Ros
Thanks Ros.
I’ve changed the execution time limit and the task completed successfully!