Postproc_nemo failure out of memory

sebsteinig · 6 October 2023 13:13

UPDATE: problem solved!

The ARCHER2 Helpdesk got back to me and said they applied a patch to the Lustre filesystem over the past few days.

I tested this with several suites and the postproc now seems to be running fine again on the serial nodes! One additional thing I had to to do was to request a bit more memory for the postproc jobs (the default is just below 2GB). 10 GB works fine for me, but I did not test other values.

If anybody runs into the same OOM error, I just added the memory request to the postproc resources in the archer2.rc configuration:

    [[POSTPROC_RESOURCE]]
        inherit = HPC_SERIAL
        pre-script = """
                     module load postproc
                     module list 2>&1
                     ulimit -s unlimited
                     """
        [[[directives]]]
            --mem=10G

Best wishes,
Seb

Topic		Replies	Views
Failure of postproc_nemo Unified Model ARCHER2	8	107	21 June 2024
Postproc nemo memory problem with UKESM1 Unified Model ARCHER2	4	138	7 December 2023
Two postprocessing errors, possibly related Rose/Cylc and FCM ARCHER2	6	163	19 May 2023
Failure in postproc_nemo Unified Model ARCHER2 , PUMATest	4	281	2 November 2023
NEMO rebuild failed ARCHER2	2	96	16 April 2024

Postproc_nemo failure out of memory

Related topics