I am using ARCHER2 to continue a run (u-bk453, previously run on ARCHER successfully for 250 years). My suite is u-ck857. I am running HadGEM3-GC3.1-LL, with a modified calving file: /work/n02/n02/radiam24/lig127k_H11/GC3.1_eORCA1v2.2x_nemo_ancils_CMIP6_H11 points to the calving file /work/n02/n02/radiam24/lig127k_H11/ecalving_v2.2x_H11.nc, which is identical to the one previously used for u-bk453. I also used this calving file to run u-ck855 for 10 years, which worked fine.
HadGEM3 seems to run fine from the restart date 2100-01-01, but then crashes around 2102-11-21.
I differenced u-bk453 and u-ck857, and they look similar (other than STASH requests and ldflags_overrides_suffix=-lstdc++).
I think the error is from NEMO; is there a way to fix this?
The error from /work/n02/n02/radiam24/cylc-run/u-ck857/work/21021101T0000Z/coupled/ocean.output is:
Greenland iceberg calving climatology (kg/s) : 5130975919.5296955
** Greenland iceberg calving adjusted value (kg/s) : 213652999.9999997**
** Antarctica iceberg calving climatology (kg/s) : 36646412.219468936**
** Antarctica iceberg calving adjusted value (kg/s) : 25618500.**
** Greenland iceshelf melting climatology (kg/s) : 0.**
** Greenland iceshelf melting adjusted value (kg/s) : 0.**
** Antarctica iceshelf melting climatology (kg/s) : -39542110.62454956**
** Antarctica iceshelf melting adjusted value (kg/s) : -31311500.000000004**
** stpctl: the elliptic solver DO not converge or explode**
it: 33330 iter:2000 r: NaN b: NaN
** stpctl: output of last fields**
** E R R O R**
** step: indic < 0**
** dia_wri_state : single instantaneous ocean state**
** and forcing fields file created**
** and named :output.abort .nc**
E R R O R
** MPPSTOP**
** NEMO abort from dia_wri_state**
** E R R O R: Calling mppstop**
The problem is with the solver as far as I can see. It might be worth setting ln_ctl=.true. in app/nemo_cice/rose-app.conf and resubmit - that should hopefully at least give a more explicit error message.
Thanks, I tried that. The output from the timestep that failed looks similar to the output from previous timesteps to me.
There is ocean output in files like /work/n02/n02/radiam24/cylc-run/u-ck857/work/21021101T0000Z/coupled/output*_0004.nc; only a few of these (e.g. output*_0004.nc and output*_0005.nc) seem to contain non-zero values.
Do you have any other ideas about where to look/what to try?
this will create a perturbed start file (there will be validation errors from mule that can be ignored. I have checked that theta is perturbed) – run the cycle again.
(I’m not sure you need 10-day dumping - with monthly cycling, monthly dumping should be OK)
If this doesn’t work, it might be worth reconfiguring the start file – failing that, you may need to seek more expert NEMO help.
Thanks so much, that worked and the coupled step ran!
The errors from mule mean that postproc_atmos fails and the atmosphere output isn’t transferred from cylc-run/u-ck857/share/data/History_Data/ to the archive, but I can transfer these files manually.
The postproc_atmos error is:
File “/work/y07/shared/umshared/lib/python3.8/mule/init.py”, line 527, in init
raise ValueError(_msg)
ValueError: Incorrect size for fixed length header; given 0 words but should be 256.
[FAIL] main_pp.py atmos <<‘STDIN’
[FAIL]
[FAIL] ‘STDIN’ # return-code=1
2022-03-31T12:29:12Z CRITICAL - failed/EXIT