ARCHER2 job cancelled due to time limit?

Dear NCAS Helpdesk,

I’m trying to run u-cl809 on ARCHER2, and it keeps failing after running for about 20mins in UMBUILD → fcm_make2_um, with errors like

slurmstepd: error: *** JOB 1097664 ON dvn01 CANCELLED AT 2022-02-11T15:54:54 DUE TO TIME LIMIT

Is there a way to change the time limit?

The suite is currently running in the standard queue with Run initialisation → wallclock time: 2h (so I’m not sure why it fails after 20mins). I have also run very similar suites before, so I don’t know why this suite fails at this step.

Thanks.

Best wishes,

Rachel

Rachel

In ~roses/u-cl809/site/archer2.rc, increase execution time limit (the build inherits HPC_SERIAL):

   [[HPC_SERIAL]]
        inherit = None, HPC
        [[[directives]]]
           --ntasks=6
           --partition=serial
           --qos=serial
        [[[environment]]]
            ROSE_TASK_N_JOBS = 32
        [[[job]]]
            execution time limit = PT20M

Grenville

Hi Grenville,

Thanks so much. Just to let you know I also had to change

[[UMBUILD_RESOURCE]]
inherit = None, HPC_SERIAL
[[[job]]]
execution time limit = PT59M#originally 20M

as it seems to be set there as well.

Best wishes,

Rachel

Ah yes - I did wonder afterwards if my answer was quite right. Not quite, but you have it correctly now. The execution time in [[UMBUILD_RESOURCE]] will override whatever is set in HPC_SERIAL, so the change in HPC_SERIAL can be reversed.

Grenville