Submit-retrying

Hi Luciana,

We’ve seen the rsync error too on occasion - it is intermittent so usually trying again works - we don’t know the cause.

u-bo026-n96-ens3 is still submit-retrying. When this happens you need to look for the error message in the log/job/<cycle>/<task>/NN/job-activity.log file. Look in home/luciana/cylc-run/u-bo026-n96-ens3/log/job/19880901T0000Z/atmos_main/08/job-activity.log and you’ll see the error message:

(login.archer2.ac.uk) 2021-12-01T08:13:13Z [STDERR] sbatch: error: QOSMaxNodePerUserLimit
(login.archer2.ac.uk) 2021-12-01T08:13:13Z [STDERR] sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)

This indicates that you are trying to run on more nodes than the queue (short queue in this instance) allows. See the ARCHER2 queue documentation I posted above in reply submit-retrying - #2

Your cray-netcdf error on login is because you are loading modules in your ~/.bash_profile that don’t exist on the 23-cab. Remove the module load line.

Regards,
Ros.