Hello team
I have recently been the recipient of the dreaded “NaNs in error term in BiCGstab” message while trying to run a nesting suite using GAL9.
Bit of background - what I’ve been working on is a weird sort of ensemble, using ERA5 data to initialise, but then swapping out the initial data with the data from doing a recon of a different timestep (e.g. putting 18Z data into 12Z). I do this by adding an extra step to the suite which I call the “transplant” step and it uses mule, based on some code from colleagues at the Met Office. The suite I’m currently working on is u-dm226
.
It works when I have done this with a RAL3.1 suite (see, for example, u-di808
), but when I have tried to do this with a GAL9 suite, the BiCGstab error appears on the first timestep of the first forecast step after transplanting.
My first thought was that my ancils were causing problems, because that’s usually what causes this error, so I regenerated all my ancils, and thought they looked okay, and tried again. However, if you don’t do the transplant step, which swaps out data in the share/[folders]/ics/GAL9_astart
, the forecast step runs without any errors.
This made me think it was just an issue with the mismatch between the boundary conditions which are from the original ERA5 timestep (e.g. 12Z) and the internal initial conditions (taken from another timestep, e.g. 18Z). But (a) this doesn’t happen with the RAL3.1 suites where I’ve done this, so why is it happening in the GAL9 suites? and (b) when I “re-transplant” the original data back into the start dump (by running the transplant step twice, once with the python script app/transplant/bin/transplant_no_lakes.py
and again with the script app/transplant/bin/transplant-reverse.py
which moves the data between files but fundamentally shouldn’t change the contents - and when I examine it with xconv it doesn’t seem like there are any differences in the fields between the files. However, as I mentioned before - when I use the original start dump generated by the recon, it works perfectly fine.
So I think something that I’m doing in the transplant step leads to the NaNs in BiCGstab - perhaps corrupting the start dump? But I’m not really sure what part of that step causes the problems. If anyone has any expertise in mule could they have a look at my code and see if there’s anything I’m doing that obviously would cause such problems? Or let me know if there’s any mule best practices for messing with start dumps.
Thanks very much!
Best,
Fran