Time limit error occurring

hello,

I have a job /u-dk172/
job.err reads
slurmstepd: error: *** STEP 8432916.0+0 ON nid001263 CANCELLED AT 2025-01-02T21:40:58 DUE TO TIME LIMIT ***

I had tried changing
execution time limit = PT59M

originally PT30M in

/home/n02/n02/jgrist02/roses/u-dk172/site/archer2.rc

and reducing the run length to 3 month, but am still getting this error…

(I got a similar error with an attempt at a longer run in
/work/n02/n02/jgrist02/cylc-run/u-da510/log/job/19500701T0000Z/coupled/01/job.err)

with many thanks

Jeremy

Jeremy

execution time limit = PT59M is associated with HPC_SERIAL - the coupled task does not use that family. The coupled job is configured to run for 10 hrs (see /home/n02/n02/jgrist02/cylc-run/u-dk172/log/job/19500101T0000Z/coupled/01/job for example),where it says:

#SBATCH --time=600:00

The machine appears to be running slowly (builds are slow.)

I note that your suite still writes XIOS log files – doing so has a large impact on performance.

We have found that coupled tasks display much reduced performance jitter and run consistently faster if XIOS log files are suppressed. The change required to do this for a currently running suite needs to be done to a file on ARCHER2. Wait until there is no running coupled task, then in ~/cylc-run//share/data/xml/iodef.xml, modify the two lines as below change

<variable id="info_level" type="int">100</variable>
to
<variable id="info_level" type="int">0</variable>
change
<variable id="print_file" type="bool">true</variable>
to
<variable id="print_file" type="bool">false</variable>

like shown here:

'<context id="xios"> 
 <variable_definition> 
  <variable id="using_server" type="bool">true</variable>
  <variable id="using_oasis" type="bool">true</variable>
  <variable id="oasis_codes_id" type="string">toyatm,toyoce
  </variable> <variable id="using_server2" type="bool">true</variable> 
  <variable id="ratio_server2" type="int">25</variable>
  <variable id="server2_dist_file_memory" type="bool">true</variable>
  <variable id="server2_dist_file_memory_ratio" type="double">0.5</variable> 
  <variable id="info_level" type="int">0</variable> 
  <variable id="print_file" type="bool">false</variable> 
 </variable_definition> 
</context>'

Grenville

Hi,
Thanks for the instruction. I’m finding the job terminates before it gets to the point where there is no running couple task , or a least during the period I am able to monitor - the job terminates - so I can’t implement the instruction.

Is there a way of doing it (suppressing XIOS files) before starting the job?

many thanks,
Jeremy