Unusual errors when performing EC config

jtalib · 29 February 2024 18:36

Hi NCAS CMS,

I’m trying to perform a 10-day nested suite starting on 20190304. I have downloaded all the data for hourly ERA5 boundary conditions (/work/n02/n02/jostal/ERA5_analysis).

When performing the configuration for EC, the configuration fails at two hourly timesteps. I have no idea why it fails at these two timesteps in particular.

I have tried re-running the same reconfig multiple times. Rebuilt the model and started from scratch. I’ve tried re-downloading the ERA5 data again. Even tried changing the files to grib1 format using cdo, but then it couldn’t recognise multiple levels?!

The model run is u-de027. The following error appears,

???
???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
? Error code: 41
? Error from routine: RCF_GRIB_READ_DATA
? Error message: Unknown GRIB version whilst extracting length of message.
? Error from processor: 10
? Error number: 210
???

for instance in /work/n02/n02/jostal/cylc-run/u-de027/log/job/20190304T0000Z/ec_um_recon_181/02/job.err .

Any idea how I can change the grib version? Just strange how it’s not common across all hourly timesteps.

Kind regards,
Josh

jtalib · 4 March 2024 16:25

Still no luck after re-building and re-running the model today. Crashes at ec_um_recon_095 and ec_um_recon_181.

Josh

jtalib · 18 March 2024 08:40

Built a new UM job (u-de532) on Friday (18th Mar '24). Same um-recon issue occurs for hourly timesteps 95 and 181. Re-downloading ERA5 data for these timestamps. Still none the wiser?

RosalynHatcher · 20 March 2024 10:59

Hi Josh,

Just to say we’re not ignoring your query. At the moment we’re not sure what to advise.

Regards,
Ros.

jtalib · 20 March 2024 11:20

Thanks. Yeah I’m a tad confused by it all.

I did think about trying a completely different 10-day period (maybe something up with extracted ERA5 data), but in the future I’m planning to perform nested simulations for various start dates. So if this error was to appear again, we’d know what to do.

Is there anyone I can contact at the Met Office? Stuart?

Kind regards,
Josh

grenville · 26 March 2024 09:13

Hi Josh

The reconfig thinks that the grib version for ec_grib_201903111200.t+000 (for ec_um_recon_180) is 2, but for ec_grib_201903111300.t+000 (for ec_um_recon_181) it finds 6.

This seems odd since ec_grib_201903111300.t+000 can be read OK with grib_api utilities and appears to be a perfectly good grib file. More investigation needed.

Grenville

jtalib · 26 March 2024 09:26

Hi Grenville,

Yep, it’s bizarre. And ec_grib_201903111300.t+000 (for ec_um_recon_095) comes out as version 93?

Let me know if I can do anything? Clueless on what to do next.

Kind regards,
Josh

grenville · 26 March 2024 09:50

maybe just sidestep the check in rcf_grib_read_data_mod.F90
grib_dump says:

grenvill@ln02:/work/y07/shared/umshared/lib/cce-15.0.0/eccodes/2.24.1/bin> ./grib_dump -OtaH /work/n02/n02/jostal/ERA5_analysis//ec_grib_201903111300.t+000 | more
***** FILE: /work/n02/n02/jostal/ERA5_analysis//ec_grib_201903111300.t+000 
#==============   MESSAGE 1 ( length=81302 )               ==============
1-4       ascii (str) identifier = GRIB ( 0x47 0x52 0x49 0x42 )
5-6       unsigned (int) reserved = MISSING ( 0xFF 0xFF )
7         codetable (int) discipline = 0 ( 0x00 ) [Meteorological products (grib2/tables/5/0.0.table) ]
**8         unsigned (int) editionNumber = 2 ( 0x02 ) [ls.edition]**
9-16      section_length (int) totalLength = 81302 ( 0x00 0x00 0x00 0x00 0x00 0x01 0x3D 0x96 )

so there appears to be something dodgy with the reconfig code.

Grenville

jtalib · 26 March 2024 13:01

Hi Grenville,

I’ll change the code and rebuild etc this afternoon. Could I ask for an extra 400 CU on account n02-NEX006247. Been trying to use it all up before end of March but would like to spend Tues-Thurs trying to fix this issue.

I’ll have plenty of CU time in the new “HPC” year.

Kind regards,
Josh

grenville · 26 March 2024 13:14

Josh

Hang on with the grib problem - my 1/2-baked suggestion didn’t work.

I added CUs.

Grenville

jtalib · 26 March 2024 14:11

So did you try a new branch where you skip rcf_grib_read_data_mod.F90?

Josh

jtalib · 26 March 2024 14:44

Oh. I realise you can’t you state the GRIB version as it can’t be assumed that it is either 1 or 2.

grenville · 26 March 2024 15:00

Josh - please tell me another reconfig that failed ie ec_um_recon_???

Grenville

jtalib · 26 March 2024 15:11

Only two fail. ec_um_recon_095 and ec_um_recon_181.

grenville · 27 March 2024 09:16

Josh

We know why the reconfig is failing (it is a problem with the reconfig code and not with the grib files) - how to fix it is still unclear - we are working on that.

Grenville

grenville · 28 March 2024 08:44

Josh

I’ve added a fix for this - please include fcm:um.xm/branches/dev/grenvillelister/vn12.0_grib-fix in the UM build and rebuild.
Please carefully check the results.

Grenville

jtalib · 28 March 2024 09:01

Hi Grenville,

Thanks for creating the new branch.

Sorry to be a pain but could you possibly explain what the new branch does differently? I can see that if skip == 1, then pos_in_file = 6. Will this work given that GRIB version 6 and 95 is being concluded for UM_config 095 and 181?

Currently updating job (and will double-check EC config output).

Kind regards,
Josh.

grenville · 28 March 2024 09:25

Josh
the reconfig searches for the bit pattern that spells GRIB in asci since that signifies the beginning of a field, but that bit pattern can appear elsewhere in the data and does (with quite low probability) - you found 2 cases – we know those occurrences don’t signify the start of a field because the next bits of data read are not sensible - the grib version and/or data size are wrong for a field. The reconfig doesn’t account for that as written. My hack says, if you find a dodgy grib version, assume you’ve not found the beginning of a field and skip it, and carry on searching.

It’s not foolproof - if there is a truly corrupt field it might struggle.

Hope that helps.

Grenville

jtalib · 28 March 2024 09:52

Hi Grenville,

Thanks. Multiple ec_um_recon are currently running. Fingers crossed for 095 and 181.

Kind regards,
Josh

jtalib · 28 March 2024 12:40

Hi all,

Model is successfully running. Those two ec reconfigs worked and it looks like sensible output.

Thanks once again.

Josh

Topic		Replies	Views
Forcing nesting suite with ERA5 Unified Model ARCHER2 , Nesting-Suite	55	1092	9 November 2023
Driving suites with ERA5 ensemble Unified Model ARCHER2 , Nesting-Suite	3	36	16 April 2025
Initialising Regional Nesting Suite using ERA5 data from different time Unified Model ARCHER2 , Nesting-Suite	5	165	5 March 2024
Option to use already-made reconfigured EC files Unified Model ARCHER2 , Nesting-Suite	1	67	16 March 2024
Error: negative mass in set_thermodynamic Unified Model ARCHER2 , Nesting-Suite	4	359	10 November 2022

Unusual errors when performing EC config

Related topics