BiCGstab error at the beginning of simulation

Monsoon3

My UM vn13.9 UKESM1.1 global AMIP suite, u-dv077, completed a one-year simulation without a problem (finally). Now I copied it and created u-dv258. I removed the aviation emissions that I added to ukca_em_file in u-dv077 and tried to run the suite. It failed with this error;

NaNs in error term in BiCGstab after 1 iterations

I’m aware that this problem is usually treated by perturbing the dump file created in the last cycle. However, in this case this occurs at the beginning of the run, so there is no such dump file except the initial dump (cx946a.da20150101_00 provided by Luke Abraham), which I used many times without a problem.

So I added back all the aviation emissions so the suite is virtually the same as u-dv077. When I tried to run it, I got the same problem. Now I tried to run u-dv077 itself again. The result is the same!

I cannot rule out the possibility that I have made a small change to u-dv077 or the branch used there after it was run.

So I wonder if I can get any advice. What can be causing the problem? What could I check? What could I try? Or is this a temporary problem and I should be just waiting?

Thanks,

Masaru

I forgot to mention that these suites use 1 UM branch (vn13.9_Contrails_Chen) and 2 UKCA branches (um13.9_primSU_Aitken and um13.9_Aviation_3D_emiss). All of these are currently set to the local working copies but they should be identical to the latest revisions committed.

I tried changing the initial dump to dv077a.da20160101_00, which was created in the previous run of u-dv077. The result didn’t change (crashed with the BiCGstab error).

Masaru

I said u-dv077 that previously ran for 1 year doesn’t run now. But more precisely, the run that had run successfully was run3. And I said I tried to run the suite again and it failed. That was run4. Now I tried extending the run3 and it is running for a while now.

There is no difference between run3/rose-suite.conf and run4/rose-suite.conf or between run3/app/um/rose-app.conf and run4/app/um/rose-app.conf except for the run length. There is no difference in the code either, as the following doesn’t return anything at all.

diff -r run3/share/fcm_make_um/extract run4/share/fcm_make_um/extract

The only difference is that run3 starts from 20160201 with the dump the suite produced whereas run4 is starting from the beginning (20150101). I just checked the initial dump cx946a.da20150101_00 but it doesn’t seem to have changed.

-r–r–r–. 1 luke.abraham.mon ukca-cam 8572932096 Jun 25 2023 /projects/ukca-cam/luke.abraham.mon/restarts/AMIP/cx946a.da20150101_00

This is very strange. Please help.

To summarise;

  • u-dv077/run3 once ran. Extension run still runs fine.
  • u-dv077/run4 doesn’t run. BiCGstab error occurs as soon as the run starts.
  • u-dv258 doesn’t run. BiCGstab error occurs as soon as the run starts.
  • u-dv258 fails using dv077a.da20160101_00 as ainitial. BiCGstab error occurs.
  • There is virtually no difference between these runs.

I know, this doesn’t make any sense at all, but it is what is happening to me.

Please somebody help me. Super-duper pretty please :blush: :folded_hands:

Masaru