Hi,
I’m trying to convert a copy of a Monsoon suite to run on Archer2. The suite in question is u-cl073 (copy of u-ck696). I followed the guidance on setting up and Archer2 suite and the suite starts off and passes all fcm tasks except fcm_make2_um which fails with the following message.
The following have been reloaded with a version change:
- cce/11.0.4 => cce/12.0.3
[FAIL] ftn -oo/veg3_red_dynamic_mod.o -c -I./include -s default64 -e m -J ./include -I/work/y07/shared/umshared/gcom/cce10.0.4/gcom7.5/archer2_ex_cce_mpp/build/include -O2 -Ovector1 -hfp0 -hflex_mp=strict -h omp /work/n02/n02/jweber/cylc-run/u-cl073/share/fcm_make_um/preprocess-atmos/src/jules/src/science/vegetation/veg3_red_dynamic_mod.F90 # rc=1
[FAIL] ftn-7991 crayftn: INTERNAL VEG3_RED_DYNAMIC, File = …/…/…/mnt/lustre/a2fs-work2/work/n02/n02/jweber/cylc-run/u-cl073/share/fcm_make_um/preprocess-atmos/src/jules/src/science/vegetation/veg3_red_dynamic_mod.F90, Line = 164
[FAIL] INTERNAL COMPILER ERROR: “local_ud_remap: required match not found” (pdgcs/v_cycles.c, line 2785, version 32d4751edd230fbbdc823f8b431bd5155e145fb9)
[FAIL] compile 10.4 ! veg3_red_dynamic_mod.o ← jules/src/science/vegetation/veg3_red_dynamic_mod.F90
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=992778.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
I haven’t seen anything like this before. Is this stemming from a difference in the compiler between Monsoon and Archer2?
Thanks for your help,
James
Hi James,
2 things:
-
We hadn’t set up the extra compile options needed for some of the files (e.g. veg3_red_dynamic_mod.F90) on the full system. In panel “fcm_make_um → Configuration file”:
-
You’re also getting an OOM which can fixed by increasing the requested memory. In site/archer2.rc
in the [[UMBUILD_RESOURCE]]
section add the following:
[[[directives]]]
--mem=25Gb
Cheers,
Ros.
Thanks, Ros, that has worked. The model now progresses past all the fcm tasks but fails on recon. The error is one I have seen many times before and I think occurs when the recon can’t find a prognostic in the input dump with which to initialise the run.
? Error from routine: RCF_RESET_DATA_SOURCE
? Error message: Section 34 Item 1 : Required field is not in input dump!
However, the stash in question (34001) is ozone and I can see it is in the input dump I have selected. Do any changes need to be made between Monsoon and Archer2 for recon?
Thanks,
James
Hi, would you be able to advise on this recon issue? There may be something quite simple that I have missed between the transition from Monsoon to Archer2 but I can’t find anything obvious.
Many thanks,
James
James
Sorry for the delay:
grenvill@ln03:~> cd /work/n02/n02/jweber/cylc-run/u-cl073
-bash: cd: /work/n02/n02/jweber/cylc-run/u-cl073: Permission denied
Please
chmod -R g+rX /home/n02/n02/<your-username>
chmod -R g+rX /work/n02/n02/<your-username>
Grenville
Thanks, Grenville, I’ve run those commands.
Best,
James
James
On Monsoon:
/home/d04/jamwe/cylc-run/u-ck696/share/data/ck696a.ainitial -> /projects/ukesm/jamwe/AINITIAL/cc298a.da20090101_00
but on ARCHER
/work/n02/n02/jweber/cylc-run/u-cl073/share/data/cl073a.ainitial -> /work/y07/shared/umshared/hadgem3/initial/atmos/N96L85/ab642a.da19880901_00
I suggest you copy the Monsoon start file to ARCHER.
Grenville
Hi Grenville,
Thank you, I had a copy of my desired initialisation file on Archer2 and thought I was directly u-cl073 to use it. However, I realise from the text you copied over from my cylc-run that my suite was defaulting to use ab642a.da19880901_00 instead. I think this is coming from code in Archer2.rc under the heading {# Set up start dumps #}. I have edited this code to point towards my desired initialisation file (from u-cc298) and the model passes recon.
Best,
James