I’m trying to perform a 10-day nested suite starting on 20190304. I have downloaded all the data for hourly ERA5 boundary conditions (/work/n02/n02/jostal/ERA5_analysis).
When performing the configuration for EC, the configuration fails at two hourly timesteps. I have no idea why it fails at these two timesteps in particular.
I have tried re-running the same reconfig multiple times. Rebuilt the model and started from scratch. I’ve tried re-downloading the ERA5 data again. Even tried changing the files to grib1 format using cdo, but then it couldn’t recognise multiple levels?!
The model run is u-de027. The following error appears,
???
???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
? Error code: 41
? Error from routine: RCF_GRIB_READ_DATA
? Error message: Unknown GRIB version whilst extracting length of message.
? Error from processor: 10
? Error number: 210
???
for instance in /work/n02/n02/jostal/cylc-run/u-de027/log/job/20190304T0000Z/ec_um_recon_181/02/job.err .
Any idea how I can change the grib version? Just strange how it’s not common across all hourly timesteps.
Built a new UM job (u-de532) on Friday (18th Mar '24). Same um-recon issue occurs for hourly timesteps 95 and 181. Re-downloading ERA5 data for these timestamps. Still none the wiser?
I did think about trying a completely different 10-day period (maybe something up with extracted ERA5 data), but in the future I’m planning to perform nested simulations for various start dates. So if this error was to appear again, we’d know what to do.
Is there anyone I can contact at the Met Office? Stuart?
The reconfig thinks that the grib version for ec_grib_201903111200.t+000 (for ec_um_recon_180) is 2, but for ec_grib_201903111300.t+000 (for ec_um_recon_181) it finds 6.
This seems odd since ec_grib_201903111300.t+000 can be read OK with grib_api utilities and appears to be a perfectly good grib file. More investigation needed.
I’ll change the code and rebuild etc this afternoon. Could I ask for an extra 400 CU on account n02-NEX006247. Been trying to use it all up before end of March but would like to spend Tues-Thurs trying to fix this issue.
I’ll have plenty of CU time in the new “HPC” year.
We know why the reconfig is failing (it is a problem with the reconfig code and not with the grib files) - how to fix it is still unclear - we are working on that.
I’ve added a fix for this - please include fcm:um.xm/branches/dev/grenvillelister/vn12.0_grib-fix in the UM build and rebuild.
Please carefully check the results.
Sorry to be a pain but could you possibly explain what the new branch does differently? I can see that if skip == 1, then pos_in_file = 6. Will this work given that GRIB version 6 and 95 is being concluded for UM_config 095 and 181?
Currently updating job (and will double-check EC config output).
Josh
the reconfig searches for the bit pattern that spells GRIB in asci since that signifies the beginning of a field, but that bit pattern can appear elsewhere in the data and does (with quite low probability) - you found 2 cases – we know those occurrences don’t signify the start of a field because the next bits of data read are not sensible - the grib version and/or data size are wrong for a field. The reconfig doesn’t account for that as written. My hack says, if you find a dodgy grib version, assume you’ve not found the beginning of a field and skip it, and carry on searching.
It’s not foolproof - if there is a truly corrupt field it might struggle.