Postproc_archive failing

Hello! I am finding my postproc_archive step to be failing for some suites (in this case u-dt825 and u-dw229) after successfully running for months. Sometimes retriggering the job works, other times it doesn’t. I was wondering if there are any solutions to this? Currently, to get a job to complete I need to check on it every few hours to make sure none of the postproc_archive steps have failed.

Thanks!

Hi Isabelle,

Have you looked in the error log file to see what the error message is? It’s impossible to advise without further information. Please post the error message here or point us to a cycle where the postproc task has failed so we can take a look.

Cheers,
Ros

Hi Rosalyn,

It appears to be a walltime error:

=>> PBS: job killed: walltime 3608 exceeded limit 3600
2026-06-08T20:29:43Z CRITICAL - failed/TERM

I know this has been a reoccurring issues. I was wondering if there are settings I could change to try to prevent this from happening as often.

Thanks!

Hi Isabelle,

You can increase the walltime to more than 1hour.

Do you know where to change this? At the moment I can’t get onto Monsoon3 for some reason, but looking in the rosie repository at u-dw229, it looks to be set in site/meto-ex.rc in [[POSTPROC_RESOURCE]] section. It’s actually defined twice once to 3 hours and then a few lines down overridden to 1 hour. So I’d remove the execution time limit = PT1H line.

Cheers,
Ros

Thank you! Can I do this while the run is going (when it next times out of the postproc_archive step)? Or do I need to restart the job?

You can change it and then (assuming cylc8) do a cylc vr <workflowid> to reload the changes into the running workflow.

Cheers,
Ros