Suggestions for debugging BICGstab "NaNs in error term"

Hi,

I’m working on it now. The UM is a complicated system (and the nesting suite especially so), and I’m afraid this could take some time to fix. My first few ideas haven’t worked, so I need to do some more coding. I will contact you with updates.

Simon.

Thank you Simon, I appreciate you taking the time to investigate this.
Best wishes
Ella

Hi Ella,

I’ve made some progress. It appears that the recon has a bug which means going from the ERA5 grid to the rotated grid with a pole is broken. I’ve done some tests and I think it will work as a two stage process ERA global gird → Endgame global grid then Endgame global grid → Endgame regional grid. The trick is working out how best to do this.

I’ve also coded up a way to “hide” the anomaly by copying data from adjacent points. The field now looks much smoother. Unfortunately the model still fails in with NaNs in the 1st t/s, so I’m a bit stuck. See /home/n02/n02/simon/cylc-run/u-cq635/share/cycle/20211201T0000Z/Arctic_RACMO/12km/ERA5_24cyc_72fcst/ics/ERA5_24cyc_72fcst_astart

I’ve been looking at the start dump, this and yours at:
/home1/home/n02/n02/shakka/cylc-run/u-cq635/share/cycle/20211201T0000Z/Arctic_RACMO/12km/ERA5_24cyc_72fcst/ics/ERA5_24cyc_72fcst_astart and it looks at little odd to me, the winds appear to be identical as you go up the model up until an height of 10327. Is this to be expected?

Simon.

Hi Simon,

Thanks so much for working on this, that’s brilliant news. I’ve also been testing downscaling from ERA5 > glm > regional but it would be good to know whether it’s more efficient to run it your way or not, given how much time and computation the glm takes.

I’ve had a look at the ERA5 data, and it appears that I only have the first 70 levels - which start from 1 hPa and go down, meaning that I’m missing the most crucial near-surface information! This could definitely explain why the recon is struggling and the first time step fails.

I’m not sure how that happened but I’ll download some more data now…!

Very best,
Ella

grib files do order the vertical coord backwards cf the UM (if I recall correctly)

Ella,

I assume you’re using the method described here for the ERA data. https://code.metoffice.gov.uk/trac/rmed/wiki/suites/nesting/ECdriver ? Have you tried using the global region as an intermediate region rather than a slightly bigger LAM, or is that what you’re already testing? That might fix the horizontal regidding issue. Hopefully the new input data will fix the vertical one.

Grenville, yep the input data is backwards wrt the UM, but eyeballing the data it appears the recon accounts for this.

Simon.

Hi Simon,

I am using that method - didn’t realise that using the glm as an intermediate region would be different to the standard glm > um method… will give it a go!

Thanks
Ella

Hi Simon/Grenville,

Do you happen to know where I can find a directory on archer with all the necessary ancil files (grid.nl, ancil_versions) etc. for the various glm resolutions?

I’ve been looking under /work/y07/shared/umshared/ancil/ but seems like not all the files the nesting suite expects are there.

Cheers
Ella

Hello Simon/Grenville,
I just wanted to bump this up as I’m unable to run with an intermediate glm nest without the associated ancil files to give under ‘dm_ec_lam_ancil_dir’. I have found the ancil_versions files under /work/y07/shared/umshared/ but I can’t seem to find the grid.nl, grid_eg.nl or vertlevs.nl files anywhere (I can copy the vertlevs/L70_40 and ancil_filenames files from the nested region, but need the correct grid description for n512.
Could you point me towards them please? Or is there a way I can use the glm as an intermediate nest without these files?
Cheers
Ella

Hi Ella,

You may be able to create the ancils yourself by using the system described on the Met Office wiki by setting the domain to the the global domain with a non rotated pole.

Have you tried running with the original slightly larger LAM with the ERA data with the near-surface data?

Simon.

Hi Simon

Sorry for the delay in replying. I’ve been trying to get your suggestions running. I have tried the latter (EC domain with full ERA5 profile) - it still fails on the first time step unfortunately and the wind issue is still present.

Am working on the EC global domain…

Ella

Hi Simon,
I still can’t get the UM to produce a global EC domain, but I have tried using the global model, but forcing it with ERA5 data, as per:

It’s a method I’ve used previously with ERA-Interim + vn 11.1 and it worked okay.

However, I’m getting “NaNs in error term in BiCGstab after 1 iterations” on the 4th incrtime (1800s) in the glm forecast. See for example /work/n02/n02/shakka/cylc-run/u-cr523/work/20211201T0000Z/glm_um_fcst_000/pe_output/umgla.fort6.pe0885

I’m not really sure why that might be, as I’ve mostly just seen this type of error in the um forecast before, sometimes associated with incompatible ancil files.

the suite I’m testing this config in is u-cr523.

Hi Ella,

Bad ancils could well be the issue. I’ve re-run your old config with the regional intermediate file for the ERA data on a slightly bigger domain using the data with extra levels which you obtained. I also added a branch /home/simon/branches/vn12.0_gribfix which hacks the polar values to be the same as their nearest neighbour . Unfortunately this still fails with NaNs. The job is in /home/n02/n02/simon/cylc-run/u-cq635 if you want to have a look, the anomaly at the pole appears to be removed.

To be honest I’m a bit stuck as to what could be causing the problem. The most likely candidate is bad ancils, but I’m guessing. I’ve had a look at the dump `/home/simon/n02/n02/cylc-run/u-cq635/share/cycle/20211201T0000Z/Arctic_RACMO/12km/ERA5_24cyc_72fcst/ics/ERA5_24cyc_72fcst_astart’ and certain fields, such as the soil moisture still look odd.

Simon.

Hi Simon,

Thanks so much. I’ll check again with the soil moisture ancils to see if I can see anything that might be causing it to fall over.

The fact that it happens in the global forecast too is quite strange though, and suggests that there might be something awry in the config.

I’m away until the 14th now, but I’ll come back to it when I’m back if I can’t see anything immediately obvious in the files.

Best wishes
Ella

Hi Ella

I’ve been trying to fix this problem myself for 200m resolution nest (ERA5 RA3 nest) so was having a nosey at this thread.

Just thought I’d share the fix that worked for me in case it works for your too. I couldn’t get my ancils to generate for my 200m domain using the nesting suite ancillary generation so set it to reconfigure the missing ancils from the 500m nest (surely should work right?!!). For some reason using recon instead of using ancil files was causing issues and my 200m nest came back with this super unhelpful BiCGstab error.

I moved to using Doug Lowes RAS suite ( u-cq149) to create all my ancillary files using ANTS taking extra care that the veg frac ancils were generated from ANTS (some settings you can get CAP and ants both creating the same files).

It looks to me like some sort of interpolation issue in recon that was causing the error for me and I wondered if it might be something similar for you?

Apologies if this is no help!

Thanks Helen, I’ve been looking into this. Have managed to make some ancils with the RAS but now struggling to get past the glm as it keeps complaining about the topographic index ancils - have you seen that? No worries if not, I’ll keep ploughing through with this method. Thanks again for the suggestion! E

Any help?

Thanks Grenville it could be, but I’m getting an HDF error when I try to read the file with either xancil or xconv.

I also would’ve thought there’s a topographic index ancil for the glm somewhere, but I can’t find one in the umshared/ancil directories - do you know if one exists?

Cheers
Ella

ah, that must have been corrupted during transfer Doug Lowe has put a working one in /work/n02/n02/shared/RAS_Archer2/input_data

1 Like

Thanks, that one works! I’ve tried selecting ‘initialise from external file’ but I imagine it might not work as it’s not on the n512 grid.

I’m actually still getting ‘required field is not in input dump’, seemingly regardless of which option I select to reconfigure topographic index, which is strange - do you have any ideas about how to get around this? I’m forcing the glm with ERA5 instead of a glm startdump, which has always worked for me in the past as long as I’ve reconfigured the other required fields.