I am using ARCHER2 to continue a run (u-bk453, previously run on ARCHER successfully for 250 years). My suite is u-ck857. I am running HadGEM3-GC3.1-LL, with a modified calving file: /work/n02/n02/radiam24/lig127k_H11/GC3.1_eORCA1v2.2x_nemo_ancils_CMIP6_H11 points to the calving file /work/n02/n02/radiam24/lig127k_H11/ecalving_v2.2x_H11.nc, which is identical to the one previously used for u-bk453. I also used this calving file to run u-ck855 for 10 years, which worked fine.
HadGEM3 seems to run fine from the restart date 2100-01-01, but then crashes around 2102-11-21.
I differenced u-bk453 and u-ck857, and they look similar (other than STASH requests and ldflags_overrides_suffix=-lstdc++).
I think the error is from NEMO; is there a way to fix this?
The error from /work/n02/n02/radiam24/cylc-run/u-ck857/work/21021101T0000Z/coupled/ocean.output is:
Greenland iceberg calving climatology (kg/s) : 5130975919.5296955
** Greenland iceberg calving adjusted value (kg/s) : 213652999.9999997**
** Antarctica iceberg calving climatology (kg/s) : 36646412.219468936**
** Antarctica iceberg calving adjusted value (kg/s) : 25618500.**
** Greenland iceshelf melting climatology (kg/s) : 0.**
** Greenland iceshelf melting adjusted value (kg/s) : 0.**
** Antarctica iceshelf melting climatology (kg/s) : -39542110.62454956**
** Antarctica iceshelf melting adjusted value (kg/s) : -31311500.000000004**
** stpctl: the elliptic solver DO not converge or explode**
it: 33330 iter:2000 r: NaN b: NaN
** stpctl: output of last fields**
** E R R O R**
** step: indic < 0**
** dia_wri_state : single instantaneous ocean state**
** and forcing fields file created**
** and named :output.abort .nc**
E R R O R
** NEMO abort from dia_wri_state**
** E R R O R: Calling mppstop**
Thanks so much.
The problem is with the solver as far as I can see. It might be worth setting
app/nemo_cice/rose-app.conf and resubmit - that should hopefully at least give a more explicit error message.
Thanks, I tried that. The output from the timestep that failed looks similar to the output from previous timesteps to me.
There is ocean output in files like /work/n02/n02/radiam24/cylc-run/u-ck857/work/21021101T0000Z/coupled/output*_0004.nc; only a few of these (e.g. output*_0004.nc and output*_0005.nc) seem to contain non-zero values.
Do you have any other ideas about where to look/what to try?
I’d try perturbing the atmosphere start file - that might change the conditions enough to prevent the NaNs being generated by the ocean.
then create a slurm file with the content here (I’d create & submit the file in:
#SBATCH --account=<your account>
module load epcc-job-env
module load cray-python
$UMDIR/scripts/perturb_theta.py $DUMP.orig --output $DUMP
then sbatch this file
this will create a perturbed start file (there will be validation errors from mule that can be ignored. I have checked that theta is perturbed) – run the cycle again.
(I’m not sure you need 10-day dumping - with monthly cycling, monthly dumping should be OK)
If this doesn’t work, it might be worth reconfiguring the start file – failing that, you may need to seek more expert NEMO help.
Thanks so much, that worked and the coupled step ran!
The errors from mule mean that postproc_atmos fails and the atmosphere output isn’t transferred from cylc-run/u-ck857/share/data/History_Data/ to the archive, but I can transfer these files manually.
The postproc_atmos error is:
File “/work/y07/shared/umshared/lib/python3.8/mule/init.py”, line 527, in init
ValueError: Incorrect size for fixed length header; given 0 words but should be 256.
[FAIL] main_pp.py atmos <<‘STDIN’
[FAIL] ‘STDIN’ # return-code=1
2022-03-31T12:29:12Z CRITICAL - failed/EXIT
You have a lot of empty files in the History_Data which are likely the culprits. I suggest removing these and try again.
-rw-r--r-- 1 radiam24 n02 0 Mar 29 13:41 ck857a.pv2102oct
-rw-r--r-- 1 radiam24 n02 0 Mar 29 13:41 ck857a.pu2102oct
-rw-r--r-- 1 radiam24 n02 0 Mar 29 13:41 ck857a.pt21021021
-rw-r--r-- 1 radiam24 n02 0 Mar 29 13:41 ck857a.pn21021021
-rw-r--r-- 1 radiam24 n02 0 Mar 29 13:41 ck857a.pl2102oct
-rw-r--r-- 1 radiam24 n02 0 Mar 29 13:41 ck857a.pk2102oct
-rw-r--r-- 1 radiam24 n02 0 Mar 29 13:41 ck857a.ph2102oct
-rw-r--r-- 1 radiam24 n02 0 Mar 29 13:41 ck857a.pe2102oct
-rw-r--r-- 1 radiam24 n02 0 Mar 29 13:41 ck857a.pd2102oct
-rw-r--r-- 1 radiam24 n02 0 Mar 29 13:41 ck857a.pa2102oct
-rw-r--r-- 1 radiam24 n02 0 Mar 29 13:41 ck857a.p921021021
-rw-r--r-- 1 radiam24 n02 0 Mar 29 13:41 ck857a.p821021021
-rw-r--r-- 1 radiam24 n02 0 Mar 29 13:41 ck857a.p721021021
-rw-r--r-- 1 radiam24 n02 0 Mar 29 13:41 ck857a.p621021021
-rw-r--r-- 1 radiam24 n02 0 Mar 29 13:41 ck857a.p52102oct
-rw-r--r-- 1 radiam24 n02 0 Mar 29 13:41 ck857a.p42102oct
-rw-r--r-- 1 radiam24 n02 0 Mar 29 13:41 ck857a.p32102oct
-rw-r--r-- 1 radiam24 n02 0 Mar 29 13:41 ck857a.p22102oct
-rw-r--r-- 1 radiam24 n02 0 Mar 29 13:41 ck857a.p12102oct
Thanks, that worked.
However the model failed again at 2104-11-01; should I try perturbing the atmosphere start file for this month as well? The error from ocean.output:
===>>> : E R R O R
stpctl: the zonal velocity is larger than 20 m/s
kt= 56565 max abs(U): 30.89 , i j k: 312 72 41
output of last fields in numwso
===>>> : E R R O R
step: indic < 0
dia_wri_state : single instantaneous ocean state
and forcing fields file created
and named :output.abort .nc
===>>> : E R R O R
NEMO abort from dia_wri_state
E R R O R: Calling mppstop
It might be worth a try. I don’t have any better ideas.
This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.