Suite submission fails at every iteration

Hi. I’ve been running UM vn13.7 global suites on Monsoon2 and they were running fine until recently, but some of my latest suites stop at the end of every cycle for some reason.

u-dq018 and u-dq112 are currently running fine while u-dq115 and u-dq162 are the ones with this problem. These suites are very similar to each other and have only small scientific differences, I believe.

cylc says ‘submission failed’ for postproc of that cycle and atmos_main of the next cycle. Other than that I don’t see any error message. Neither job.err or job.out seems to have been created.

If I manually trigger run on these failed jobs, they run fine for a cycle and then stop for the same reason. That means they require my constant attention and are stopping for hours over night.

Please could you help me?

Thanks,
Masaru

Hi Masaru,

This is an intermittent issue caused by the way the suite is set up resulting in cylc trying to ssh from xcs to itself which isn’t needed and cause issues. Someone else had this issue very recently too.

In site/monsoon.rc change in the [[HPC]] section:

Replace the line:
host = $(rose host-select xcs-c)
with
host = localhost

Regards,
Ros.

Hi Ros.
Oh, that’s great. It seems to be working.
Thanks a lot for your help!
Masaru

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.