Unable to WGDOS pack to this accuracy (2)

MYoshioka · 15 October 2021 14:37

I have a ticket with the same title and that one was resolved, but this one doesn’t seem to be solved. Here is the error message I’m getting from glm_um_fcst_000 in u-ci559.

?  Error from routine: WGDOS Packing (f_shum_wgdos_pack)
?  Error message: Problem packing field...
?        STASH:   431
?        Accuracy: -28
?        Minimum:   -0.1157114003E+39
?        Maximum:    0.6946223300E+38
?        Message:  Unable to WGDOS pack to this accuracy

I don’t think I’m requesting STASH 431.

I thought this error might be related to unit 15 = pp5 by looking at the pe output. So I increased the reserved header and set packing option unpacked, but the process fails with the same error. Maybe the problem is caused somewhere else.

The minimum and maximum values above appear to be wrong, but I’m not sure what values these are meant to be. (Maybe they are in /home/d03/myosh/cylc-run/u-ci559/share/cycle/20190711T0000Z/glm/um/umglaa_cb000 ? but then not sure why that data is wrong or corrupted…)

This suite finished running the first 24 hours (20190710) and this is the run for the second day. I’m using the dump file created by the first day of run as the UKCA initial file and the operational forecast dump (20190711T0000Z_glm_t+0) for dm_ic_file. Maybe I’m missing something else. Or is it necessary to use the same dump as dm_ic_file ?

Please could anyone help? Thanks.
Masaru

grenville · 15 October 2021 16:30

Hi Masaru

stash 431 is probably the section 0 lbc item (look in /home/d03/myosh/roses/u-ci559/app/glm_um/opt/rose-app-lbc.conf) which has operational packing.
The error usually indicates that the model has produced a number that the packing can not handle, which usually indicates an error. You could try not packing the lbcs to see what value the model has created. Try setting packing=0 in /home/d03/myosh/roses/u-ci559/app/glm_um/opt/rose-app-lbc.conf

Grenville

MYoshioka · 18 October 2021 09:27

Thank you, Grenville. I did that and got this;

?  Error from routine: EG_BICGSTAB
?  Error message: Convergence failure in BiCGstab after      1 iterations: omg is too small

This is a problem I have encountered and consulted once (http://cms.ncas.ac.uk/ticket/3561), but has not really been resolved. I just found a grid setting that does not cause this problem. And now it came back.

I got this just before the error message in /home/d03/myosh/cylc-run/u-ci559/log/job/20190711T0000Z/glm_um_fcst_000/NN/job.out;

Grid orientation in degrees -  -0.1074E+10
Year        Day       Hour      Minute     Second
Atmosphere time =  -0.1074E+10 -0.1074E+10 -0.1074E+10 -0.1074E+10 -0.1074E+10
Mass, energy, energy drift =  -0.1074E+10 -0.1074E+10 -0.1074E+10

Is this the problem? Any idea how to fix it?
Masaru

grenville · 18 October 2021 09:47

Masaru

What are the values for the field 431?

Grenville

MYoshioka · 18 October 2021 10:09

How do I check?

grenville · 18 October 2021 15:35

Masaru

I took a coy of your suite - it ran OK for 24 hrs. Maybe try a rose suite-run --new?

Grenville

MYoshioka · 22 October 2021 14:43

Hi Grenville,

I tried that and all previous results were deleted, including regional dump files that were to be used in this continuation run… Anyway that’s something I can take care of next time I run the suite.

But when you ran the suite didn’t you get this error in the regional model?

I have had so many problems running these suites and many of them have not been resolved but only somehow avoided. So I have been haunted by them repeatedly…

Masaru

grenville · 22 October 2021 16:20

Masaru

I may have misunderstood your message of Oct 15th - I thought you were getting the
EG_BICGSTAB error in the global run. I’d not paid attention to the regional model. But having run the suite longer, I did get Regn1_Brit2Port_RA2M_um_fcst_000 failing with the error
Error message: Boundary data starts after start of current boundary data interval

and the Regn1_wPorto_RA2M_um_recon task fails (I don’t understand why.)

Grenville

MYoshioka · 22 October 2021 17:03

No, Grenville. I don’t think you misunderstood the problem.
I just had another problem, and it is also a very familiar one…
I’ve been haunted by these problems and others…

But I didn’t get the problem with recon.

Masaru

grenville · 25 October 2021 10:09

Hi Masaru

FYI - we do have ANTS v13 working on Monsoon. It just requires a small change to the python_env file. Let me know if that’d be any use.

Grenville

MYoshioka · 25 October 2021 10:59

Hi Grenville,

Thank you for letting me know! I tried running u-ci559 with ANTS turned on and ANCIL_ANTS_VEGFRAC failed.

Where is python_env file? I found ones in cylc-run folder for a couple of suites (like ~/cylc-run/u-ci463/bin/python_env) but not for this one. And how should I change it?

Masaru

grenville · 25 October 2021 12:38

Hi Masaru

u-ci559 is configured a little differently to the suite I was testing ANTS on. Please look at /home/d03/gmslis/roses/u-ci559, where I have made changes so that it uses ANTS to calculate Regn1_Brit2Port_ancil_ants_vegfrac.

To see what I added/changed:

gmslis@xcslc0:~/roses/u-ci559>  grep -r python_env *
app/ants_vegfrac/rose-app.conf:default=python_env ancil_lct.py ${source} \
site/monsoon-cray-xc40/python_env:# Usage python_env CMD_WITHOPTS
suite-runtime/ancils.rc:        init-script=mkdir -p ${CYLC_SUITE_DEF_PATH}/bin; cp -f ${CYLC_SUITE_DEF_PATH}/site/{{ SITE }}/python_env ${CYLC_SUITE_DEF_PATH}/bin/

To use ANTS for other ancillary files, you will need to copy what’s done in app/ants_vegfrac/rose-app.conf for other ancillary types.

Grenville

MYoshioka · 28 October 2021 09:28

Hi Grenville,

Thank you for this.
ants_general_aero is the only other directory I can see in app/ants*. Should I change this line in app/ants_general_aero/rose-app.conf

default=${ANTS_PYTHON_PATH}/bin/python -s ${ANTS_PYTHON_PATH}/bin/ancil_general_regrid.py ${source} --target-grid ${target_grid} \

into this?

default=python_env ancil_general_regrid.py ${source} --target-grid ${target_grid} \

Masaru

grenville · 28 October 2021 09:43

Hi Masaru

That looks correct (I didn’t test that 'tho)

Grenville

MYoshioka · 28 October 2021 10:36

Thanks, Grenville.

It looks like ANCIL and ANTS processes went fine, except that qrclim.sulpdms has value zero everywhere. This can be replaced with the file I created from the global data using python and xancil. So this is a progress.

But Regn1_Brit2Port_RA2M_um_fcst_000 fails with the same error as above…
INBOUNDA: Boundary data starts after start of current boundary data interval

Masaru

grenville · 28 October 2021 10:58

I got a sulpdms file - see /home/d03/gmslis/cylc-run/u-ci559/share/data/ancils/Regn1/Brit2Port/qrclim.sulpdms

MYoshioka · 28 October 2021 11:19

It works fine for the larger nest. for wPorto it is all zero. But that’s not the major problem any more.

MYoshioka · 28 October 2021 12:33

I forgot to set I_override_date_time back to 0 for the regional model. I think this was causing the INBOUNDA problem.
Now I have a different problem. I’ll ask you about this after I make some checks and tests.

Masaru

MYoshioka · 28 October 2021 14:33

I was having the similar problem to the top of this page;

?  Error from routine: WGDOS Packing (f_shum_wgdos_pack)
?  Error message: Problem packing field...
?        STASH:  2380
?        Accuracy: -24
?        Minimum:    0.5540517937E-20
?        Maximum:    0.4970327279E+04
?        Message:  Unable to WGDOS pack to this accuracy

The difference is that I do request STASH item 2-380 and want to have this output.
But ‘Accuracy: -24’ sounds strange.

I set packing=0 in /home/d03/myosh/roses/u-ci960/app/glm_um/opt/rose-app-lbc.conf and packing=0 for pp2 used by upc through which I request item 2-380. I don’t know why but it seems to be running fine now.

MYoshioka · 29 October 2021 16:12

u-ci960 ran fine for two 24 hour simulations. so I think it is mostly fine (except some minor issues).

u-ci985 is a copy of u-ci960 but I made both nests a little larger. (I initially tried including the third nest near SW of Britain but it wasn’t created correctly.)

In this suite Regn1_Brit2PortL_RA2M_um_fcst_000 fails with an error message “Convergence failure in BiCGstab after 1 iterations: omg is too small”. This is the same error message before (above) but that time it occurred for the global model.

I compared u-ci960 and u-ci985 and now I can’t see a difference that might cause the problem…

This problem does not go even if I run the suite as new.

Masaru

Topic		Replies	Views
Error: Unable to WGDOS pack to this accuracy Unified Model Monsoon2 , Nesting-Suite	8	100	30 June 2025
Unable to WGDOS pack to this accuracy Unified Model Monsoon2 , Nesting-Suite	5	450	9 July 2021
Suggestions for debugging BICGstab "NaNs in error term" Unified Model	57	1374	11 January 2023
Suite failing but no obvious error Unified Model PUMA , ARCHER2	32	354	30 May 2025
Metadata error Unified Model	40	506	12 July 2024

Unable to WGDOS pack to this accuracy (2)

Related topics