Compilation Time-Limit exceedance on ARCHER2 re: v8.4 GA4 UM-UKCA

Dear NCAS-CMS helpdesk team,

I’m running some repeat simulations of the Agung GA4 UM-UKCA simulations from Dhomse et al. (2020) with a modified injection height, re: my former-MRes student’s paper comparing to the 1963/64 dust-sonde, searchlight and lidar measurements of the Agung enhancement in the Northern hemisphere.

And I’m starting a work-flow to recover the ARCHER Agung v8.4 UM-UKCA job.

Those jobs are configured very similarly to the Pinatubo GA4 UM-UKCA v8.4 simulations we re-ran on ARCHER2 in autumn 2022.

And I’m then re-submitting xplt-o – one of the UM-UKCA runs I re-ran about 12 months ago on ARCHER2.

The job is submitting OK, but it is getting a time-out error at the 1-hour time-limit for the compilation of the UM executable.

I’ve jotted down full info below, and I can see the SUBMIT file sets the 1:00:00 compilation time-limit, but not sure the best way to change this.

Please can you advise here, re: the best way getting around this TIME LIMIT exceedance problem

Is it simply a case of increaseing the time-limit from 1:00:00 to 1:30:00, say?
(If so please can you advise the best way to do that – just SUBMIT that needs to be edited?)

Or is it better to run the UM compilation on more than 1 processor
(does that need a different slurm hand-edit, or will it go to the standard queue rather than the serial queue automatically if that option is selected?)

Thanks for your help with this,

Best regards,

Cheers
Graham

Background info:

This is the first time I’ve submitted a v8.4 UM-UKCA job on ARCHER2 since the
major software-upgrade outage in May/June 2023.

In April 2023 these jobs were compiling fine etc., but the serial compilation seems to be taking longer than previously after the software upgrade, and I’m finding the compilation of the model executable no longer completes within the 1-hour time-limit.

My recollection is that previously (with the old bath sceduler on ARCHER 1) it’s been possible to add-in a hand-edit to the UMUI job to change the time-limit for the compilation to be greater than 1 hour.

The syntax will be different I guess with the slurm batch-scheduler on ARCHER2.

Please can you advise, is it a straightforward edit to one ot the control files on PUMA that I can then code-up to a hand-edit to do that automatically at submission of the job from the UMUI?

In the SUBMIT file, I can see the lines below:

############################################

Comp, NRUN and CRUN Time Limits QSUB

############################################
COMP_TIME_LIMIT=01:00:00
NRUN_TIME_LIMIT=48:00:00
CRUN_TIME_LIMIT=48:00:00

Is it just to edit that COMP_TIME_LIMIT in the SUBMIT file, or does the UMSUBMIT file also need to be edited as well?

If it is just the SUBMIT file, I guess it’s just a case of increasing that COMP_TIME_LIMIT from 01:00:00 to 02:00:00?

Unless the queue policy on ARCHER2 has a hard-limit of 1:00:00 for the serial queues?

In the UMUI, the “Compile and run options for Atmosphere and Reconfiguration”, there
is also the panel at the top “Time limit for compilation” which is currently set to “-1”.

I’m not sure if that can be specified as “2:00:00” there, or if needs to be in units of seconds (as for the Wall-clock settings in other panels).

There’s also the option there to specify the no. of compiilation processes, which could be done I guess.

But I’m not sure it then would automatically be submitted to the standard queue, or if there would need to be another hand-edit to set that to be submitted differently.

See this is the error message from submitting the UMUI job xplto from PUMA to ARCHER2

sslurmstepd: error: *** JOB 4403388 ON dvn01 CANCELLED AT 2023-09-08T08:47:43 DUE TO TIME LIMIT ***

-rw-r–r-- 1 gmann n02 9769449 Sep 8 00:16 xplto000.xplto.d23250.t231328.comp.leave
-rw-r–r-- 1 gmann n02 26536 Sep 8 07:38 xplto000.xplto.d23251.t073229.comp.leave
-rw-r–r-- 1 gmann n02 10357933 Sep 8 08:48 xplto000.xplto.d23251.t074445.comp.leave
gmann@ln03:~/output> grep ‘LIMIT’ xplto000.xplto.d23251.t074445.comp.leave
UM__utility__qxreconf__rcf_adjust_pstar_mod.F90: 151 line(sslurmstepd: error: *** JOB 4403388 ON dvn01 CANCELLED AT 2023-09-08T08:47:43 DUE TO TIME LIMIT ***

Well I just tried editing the “-1” to 01:30:00 in the UMUI “Compile and run options for Atmos & Recon” panel, and it said it needed to be in seconds.

So I changed that to 900s.

And then when I clicked process, I can see that has updated the COMP_TIME_LIMIT to be specified in that format, 01:30:00, so that’s that part of the query answered (see below).

I’ve just clicked to submit the file with 01:30:00 compile time-limit and hopefully the queue policy for the serial-queue allows jobs to be longer-duration than 1 hour.

############################################

Comp, NRUN and CRUN Time Limits QSUB

############################################
COMP_TIME_LIMIT=01:30:00
NRUN_TIME_LIMIT=48:00:00
CRUN_TIME_LIMIT=48:00:00

Sorry changed that to 5400s I meant.

Just to update on this thread, that worked fine to incrase the time-limit to 5400s,
although it still needed a bit more time to complete the compilation of the UM executable.

When I further increased the compile-time to 10800s, the compile of the model executable then completed successfully.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.