Coupled runs on Monsoon submit failing each resubmission

Hi,

I have some coupled runs (e.g u-dh43) which run fine on the coupled task but then hit submit-retrying at each postproc stage with error messages like:

ERROR: file not found: /home/d04/jamwe/cylc-run/u-dh430/log/job/20660101T0000Z/postproc_atmos/05/job.err
ERROR: command terminated by signal 1: ssh -oBatchMode=yes -oConnectTimeout=8 -oStrictHostKeyChecking=no -n xcs-c env CYLC_VERSION=7.8.14 bash --login -c ‘’“'”‘exec “$0” “$@”’“'”‘’ cylc cat-log ‘–remote-arg=’“'”‘$HOME/cylc-run/u-dh430/log/job/20660101T0000Z/postproc_atmos/05/job.err’“'”‘’ --remote-arg=tail ‘–remote-arg=’“'”‘tail -n +1 -F %(filename)s’“'”‘’ u-dh430

Once I retrigger the tasks, they run fine but the next coupled task won’t go until the previous submission’s timesteps finish.

This seems similar to an old, but persistent issue I have with AMIP run (http://cms.ncas.ac.uk/ticket/3505#comment:4 - not sure if link works anymore) which was solved by changing the host = $(rose host-select xcs-c) → host = localhost in the HPC section of monsoon.rc. However, for coupled runs I can’t see an equivalent change.

Have you seen this before?

Thanks for your help,

James

James

In /home/d04/jamwe/cylc-run/u-dh430/site/MONSooN.rc

try making the change here:


        [[[remote]]]
            host = xcs-c

Grenville

Thanks, Grenville, that has done it.

James