Dear NCAS Helpdesk,
I’m trying to run u-cl809 on ARCHER2, and it keeps failing after running for about 20mins in UMBUILD → fcm_make2_um, with errors like
slurmstepd: error: *** JOB 1097664 ON dvn01 CANCELLED AT 2022-02-11T15:54:54 DUE TO TIME LIMIT
Is there a way to change the time limit?
The suite is currently running in the standard queue with Run initialisation → wallclock time: 2h (so I’m not sure why it fails after 20mins). I have also run very similar suites before, so I don’t know why this suite fails at this step.
Thanks.
Best wishes,
Rachel
Rachel
In ~roses/u-cl809/site/archer2.rc, increase execution time limit
(the build inherits HPC_SERIAL):
[[HPC_SERIAL]]
inherit = None, HPC
[[[directives]]]
--ntasks=6
--partition=serial
--qos=serial
[[[environment]]]
ROSE_TASK_N_JOBS = 32
[[[job]]]
execution time limit = PT20M
Grenville
Hi Grenville,
Thanks so much. Just to let you know I also had to change
[[UMBUILD_RESOURCE]]
inherit = None, HPC_SERIAL
[[[job]]]
execution time limit = PT59M#originally 20M
as it seems to be set there as well.
Best wishes,
Rachel
Ah yes - I did wonder afterwards if my answer was quite right. Not quite, but you have it correctly now. The execution time in [[UMBUILD_RESOURCE]] will override whatever is set in HPC_SERIAL, so the change in HPC_SERIAL can be reversed.
Grenville