Submit-retrying

RosalynHatcher · 1 December 2021 11:26

Hi Luciana,

We’ve seen the rsync error too on occasion - it is intermittent so usually trying again works - we don’t know the cause.

u-bo026-n96-ens3 is still submit-retrying. When this happens you need to look for the error message in the log/job/<cycle>/<task>/NN/job-activity.log file. Look in home/luciana/cylc-run/u-bo026-n96-ens3/log/job/19880901T0000Z/atmos_main/08/job-activity.log and you’ll see the error message:

(login.archer2.ac.uk) 2021-12-01T08:13:13Z [STDERR] sbatch: error: QOSMaxNodePerUserLimit
(login.archer2.ac.uk) 2021-12-01T08:13:13Z [STDERR] sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)

This indicates that you are trying to run on more nodes than the queue (short queue in this instance) allows. See the ARCHER2 queue documentation I posted above in reply submit-retrying - #2

Your cray-netcdf error on login is because you are loading modules in your ~/.bash_profile that don’t exist on the 23-cab. Remove the module load line.

Regards,
Ros.

Topic		Replies	Views
Submit retrying/failed Unified Model	3	162	8 November 2023
Suite submission failed Unified Model ARCHER2	2	217	26 July 2021
Low priority queue Unified Model ARCHER2 , PUMATest	3	219	23 February 2022
Suite restart fails Unified Model ARCHER2 , PUMATest	2	219	15 June 2022
Submit-failed Unified Model ARCHER2 , PUMATest	28	635	13 December 2023

Submit-retrying

Related topics