Monsoon to Archer2 Suite

Hi,

I’m trying to convert a copy of a Monsoon suite to run on Archer2. The suite in question is u-cl073 (copy of u-ck696). I followed the guidance on setting up and Archer2 suite and the suite starts off and passes all fcm tasks except fcm_make2_um which fails with the following message.

The following have been reloaded with a version change:

  1. cce/11.0.4 => cce/12.0.3

[FAIL] ftn -oo/veg3_red_dynamic_mod.o -c -I./include -s default64 -e m -J ./include -I/work/y07/shared/umshared/gcom/cce10.0.4/gcom7.5/archer2_ex_cce_mpp/build/include -O2 -Ovector1 -hfp0 -hflex_mp=strict -h omp /work/n02/n02/jweber/cylc-run/u-cl073/share/fcm_make_um/preprocess-atmos/src/jules/src/science/vegetation/veg3_red_dynamic_mod.F90 # rc=1
[FAIL] ftn-7991 crayftn: INTERNAL VEG3_RED_DYNAMIC, File = …/…/…/mnt/lustre/a2fs-work2/work/n02/n02/jweber/cylc-run/u-cl073/share/fcm_make_um/preprocess-atmos/src/jules/src/science/vegetation/veg3_red_dynamic_mod.F90, Line = 164
[FAIL] INTERNAL COMPILER ERROR: “local_ud_remap: required match not found” (pdgcs/v_cycles.c, line 2785, version 32d4751edd230fbbdc823f8b431bd5155e145fb9)
[FAIL] compile 10.4 ! veg3_red_dynamic_mod.o ← jules/src/science/vegetation/veg3_red_dynamic_mod.F90
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=992778.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

I haven’t seen anything like this before. Is this stemming from a difference in the compiler between Monsoon and Archer2?

Thanks for your help,

James

Hi James,

2 things:

  1. We hadn’t set up the extra compile options needed for some of the files (e.g. veg3_red_dynamic_mod.F90) on the full system. In panel “fcm_make_um → Configuration file”:

    • Set config_root_path to be:
      fcm:um.xm-br/dev/simonwilson/vn12.0_archer2_compile

    • Remove the config_revision so that it is blank.

  2. You’re also getting an OOM which can fixed by increasing the requested memory. In site/archer2.rc in the [[UMBUILD_RESOURCE]] section add the following:

    [[[directives]]]
        --mem=25Gb

Cheers,
Ros.

Thanks, Ros, that has worked. The model now progresses past all the fcm tasks but fails on recon. The error is one I have seen many times before and I think occurs when the recon can’t find a prognostic in the input dump with which to initialise the run.

? Error from routine: RCF_RESET_DATA_SOURCE
? Error message: Section 34 Item 1 : Required field is not in input dump!

However, the stash in question (34001) is ozone and I can see it is in the input dump I have selected. Do any changes need to be made between Monsoon and Archer2 for recon?

Thanks,

James

Hi, would you be able to advise on this recon issue? There may be something quite simple that I have missed between the transition from Monsoon to Archer2 but I can’t find anything obvious.

Many thanks,

James

James

Sorry for the delay:

grenvill@ln03:~> cd /work/n02/n02/jweber/cylc-run/u-cl073
-bash: cd: /work/n02/n02/jweber/cylc-run/u-cl073: Permission denied

Please

chmod -R g+rX /home/n02/n02/<your-username>
chmod -R g+rX /work/n02/n02/<your-username>

Grenville

Thanks, Grenville, I’ve run those commands.

Best,

James

James

On Monsoon:
/home/d04/jamwe/cylc-run/u-ck696/share/data/ck696a.ainitial -> /projects/ukesm/jamwe/AINITIAL/cc298a.da20090101_00

but on ARCHER
/work/n02/n02/jweber/cylc-run/u-cl073/share/data/cl073a.ainitial -> /work/y07/shared/umshared/hadgem3/initial/atmos/N96L85/ab642a.da19880901_00

I suggest you copy the Monsoon start file to ARCHER.

Grenville

Hi Grenville,

Thank you, I had a copy of my desired initialisation file on Archer2 and thought I was directly u-cl073 to use it. However, I realise from the text you copied over from my cylc-run that my suite was defaulting to use ab642a.da19880901_00 instead. I think this is coming from code in Archer2.rc under the heading {# Set up start dumps #}. I have edited this code to point towards my desired initialisation file (from u-cc298) and the model passes recon.

Best,

James