BiCGstab error at the beginning of simulation

Monsoon3

My UM vn13.9 UKESM1.1 global AMIP suite, u-dv077, completed a one-year simulation without a problem (finally). Now I copied it and created u-dv258. I removed the aviation emissions that I added to ukca_em_file in u-dv077 and tried to run the suite. It failed with this error;

NaNs in error term in BiCGstab after 1 iterations

I’m aware that this problem is usually treated by perturbing the dump file created in the last cycle. However, in this case this occurs at the beginning of the run, so there is no such dump file except the initial dump (cx946a.da20150101_00 provided by Luke Abraham), which I used many times without a problem.

So I added back all the aviation emissions so the suite is virtually the same as u-dv077. When I tried to run it, I got the same problem. Now I tried to run u-dv077 itself again. The result is the same!

I cannot rule out the possibility that I have made a small change to u-dv077 or the branch used there after it was run.

So I wonder if I can get any advice. What can be causing the problem? What could I check? What could I try? Or is this a temporary problem and I should be just waiting?

Thanks,

Masaru

I forgot to mention that these suites use 1 UM branch (vn13.9_Contrails_Chen) and 2 UKCA branches (um13.9_primSU_Aitken and um13.9_Aviation_3D_emiss). All of these are currently set to the local working copies but they should be identical to the latest revisions committed.

I tried changing the initial dump to dv077a.da20160101_00, which was created in the previous run of u-dv077. The result didn’t change (crashed with the BiCGstab error).

Masaru

I said u-dv077 that previously ran for 1 year doesn’t run now. But more precisely, the run that had run successfully was run3. And I said I tried to run the suite again and it failed. That was run4. Now I tried extending the run3 and it is running for a while now.

There is no difference between run3/rose-suite.conf and run4/rose-suite.conf or between run3/app/um/rose-app.conf and run4/app/um/rose-app.conf except for the run length. There is no difference in the code either, as the following doesn’t return anything at all.

diff -r run3/share/fcm_make_um/extract run4/share/fcm_make_um/extract

The only difference is that run3 starts from 20160201 with the dump the suite produced whereas run4 is starting from the beginning (20150101). I just checked the initial dump cx946a.da20150101_00 but it doesn’t seem to have changed.

-r–r–r–. 1 luke.abraham.mon ukca-cam 8572932096 Jun 25 2023 /projects/ukca-cam/luke.abraham.mon/restarts/AMIP/cx946a.da20150101_00

This is very strange. Please help.

To summarise;

  • u-dv077/run3 once ran. Extension run still runs fine.
  • u-dv077/run4 doesn’t run. BiCGstab error occurs as soon as the run starts.
  • u-dv258 doesn’t run. BiCGstab error occurs as soon as the run starts.
  • u-dv258 fails using dv077a.da20160101_00 as ainitial. BiCGstab error occurs.
  • There is virtually no difference between these runs.

I know, this doesn’t make any sense at all, but it is what is happening to me.

Please somebody help me. Super-duper pretty please :blush: :folded_hands:

Masaru

I went one suite back and started from there (u-du915) to do the same as dv077. In that process I noticed that I had changed a namelist setting because the run3 of dv077 had failed after running for 6 months.

[namelist:items(085d9ed4)]
ancilfilename='$ROSE_DATA/etc/ancil/qrparm.veg.frac'
domain=1
!!interval=5
l_ignore_ancil_grid_check=.false.
!!netcdf_varname='unset','unset'
!!period=3
source=4
stash_req=216
update_anc=.false.
!!user_prog_ancil_stash_req=
!!user_prog_rconst=0.0

I had changed update_anc from true to false (and rose edit automatically put ‘!!’ in frot of interval=5 and period=3). With this change the suite continued to run after the 6 months so I thought this was a right change. I committed it but copies of this suite was failing. That was what was happening.

So now I know this namelist is causing the error but still cannot fix the problem. In u-dv725 (committed), I changed the veg frac data from
u-by791_m01s00i216_1979-2014_annual_timeseries_land_cover_frac.anc
to
/common/share/monsoon_ancils/atmos/n96e/orca1/vegetation/fractions_igbp/v4/qrparm.veg.frac
and I thought this would fix the problem, but it actually doesn’t. dv725/run2 fails just like the first time with dv077/run3. atmos_main fails after 6 months with this error;

? Error code: 500
? Error from routine: UP_ANCIL
? Error message: REPLANCA: error finding next lookup entry for field: 4 stashcode 216
? Error from processor: 0
? Error number: 70

I thought this qrparm.veg.frac might actually be a time series data but it is not. Its time dimension is 1 (nt=1) so it does look like a climatological data.

I thought maybe I should set a different value for source? I tested source=2 but this time recon fails with this error “Land surface configuration does not match ancillary”.

Please can I have any advice?
Masaru

Hi Masaru,

I am not sure I can follow this entirely, but in case of the veg frac (or any other surface) ancillary you can only use the ‘CMIP6’ ancillaries with an UKESM configuration as this has a different land-surface setup to other science configurations. In this instance for Present Day type of runs you can try to use the ancils in /common/share/monsoon_ancils_cmip6/model_derived/ukesm1.0_historical_r2i1p1f2_u-bc292/n96e/clim_2005-2014/

Hi Mohit,

Oh, that’s great! I replaced all data in
app/install_ancil/opt/rose-app-monsoon3.conf
and
app/um/opt/rose-app-monsoon.conf
with those in the directory you gave here. I set source=2.
Now my suite runs without any problem.

This has annoyed me for many weeks but the problem was surprisingly simple!!!

Now I just wonder why this kind of information has to be so hard to find.

Anyway, thanks for your help.
Masaru

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.