execution time limit = PT59M is associated with HPC_SERIAL - the coupled task does not use that family. The coupled job is configured to run for 10 hrs (see /home/n02/n02/jgrist02/cylc-run/u-dk172/log/job/19500101T0000Z/coupled/01/job for example),where it says:
#SBATCH --time=600:00
The machine appears to be running slowly (builds are slow.)
I note that your suite still writes XIOS log files – doing so has a large impact on performance.
We have found that coupled tasks display much reduced performance jitter and run consistently faster if XIOS log files are suppressed. The change required to do this for a currently running suite needs to be done to a file on ARCHER2. Wait until there is no running coupled task, then in ~/cylc-run//share/data/xml/iodef.xml, modify the two lines as below change
<variable id="info_level" type="int">100</variable>
to <variable id="info_level" type="int">0</variable>
change <variable id="print_file" type="bool">true</variable>
to <variable id="print_file" type="bool">false</variable>
Hi,
Thanks for the instruction. I’m finding the job terminates before it gets to the point where there is no running couple task , or a least during the period I am able to monitor - the job terminates - so I can’t implement the instruction.
Is there a way of doing it (suppressing XIOS files) before starting the job?