Unable to WGDOS pack to this accuracy (2)

I have a ticket with the same title and that one was resolved, but this one doesn’t seem to be solved. Here is the error message I’m getting from glm_um_fcst_000 in u-ci559.

?  Error from routine: WGDOS Packing (f_shum_wgdos_pack)
?  Error message: Problem packing field...
?        STASH:   431
?        Accuracy: -28
?        Minimum:   -0.1157114003E+39
?        Maximum:    0.6946223300E+38
?        Message:  Unable to WGDOS pack to this accuracy

I don’t think I’m requesting STASH 431.

I thought this error might be related to unit 15 = pp5 by looking at the pe output. So I increased the reserved header and set packing option unpacked, but the process fails with the same error. Maybe the problem is caused somewhere else.

The minimum and maximum values above appear to be wrong, but I’m not sure what values these are meant to be. (Maybe they are in /home/d03/myosh/cylc-run/u-ci559/share/cycle/20190711T0000Z/glm/um/umglaa_cb000 ? but then not sure why that data is wrong or corrupted…)

This suite finished running the first 24 hours (20190710) and this is the run for the second day. I’m using the dump file created by the first day of run as the UKCA initial file and the operational forecast dump (20190711T0000Z_glm_t+0) for dm_ic_file. Maybe I’m missing something else. Or is it necessary to use the same dump as dm_ic_file ?

Please could anyone help? Thanks.
Masaru

Hi Masaru

stash 431 is probably the section 0 lbc item (look in /home/d03/myosh/roses/u-ci559/app/glm_um/opt/rose-app-lbc.conf) which has operational packing.
The error usually indicates that the model has produced a number that the packing can not handle, which usually indicates an error. You could try not packing the lbcs to see what value the model has created. Try setting packing=0 in /home/d03/myosh/roses/u-ci559/app/glm_um/opt/rose-app-lbc.conf

Grenville

Thank you, Grenville. I did that and got this;

?  Error from routine: EG_BICGSTAB
?  Error message: Convergence failure in BiCGstab after      1 iterations: omg is too small

This is a problem I have encountered and consulted once (http://cms.ncas.ac.uk/ticket/3561), but has not really been resolved. I just found a grid setting that does not cause this problem. And now it came back.

I got this just before the error message in /home/d03/myosh/cylc-run/u-ci559/log/job/20190711T0000Z/glm_um_fcst_000/NN/job.out;

Grid orientation in degrees -  -0.1074E+10
Year        Day       Hour      Minute     Second
Atmosphere time =  -0.1074E+10 -0.1074E+10 -0.1074E+10 -0.1074E+10 -0.1074E+10
Mass, energy, energy drift =  -0.1074E+10 -0.1074E+10 -0.1074E+10

Is this the problem? Any idea how to fix it?
Masaru

Masaru

What are the values for the field 431?

Grenville

How do I check?

Masaru

I took a coy of your suite - it ran OK for 24 hrs. Maybe try a rose suite-run --new?

Grenville

Hi Grenville,

I tried that and all previous results were deleted, including regional dump files that were to be used in this continuation run… Anyway that’s something I can take care of next time I run the suite.

But when you ran the suite didn’t you get this error in the regional model?

I have had so many problems running these suites and many of them have not been resolved but only somehow avoided. So I have been haunted by them repeatedly… :sob:

Masaru

Masaru

I may have misunderstood your message of Oct 15th - I thought you were getting the
EG_BICGSTAB error in the global run. I’d not paid attention to the regional model. But having run the suite longer, I did get Regn1_Brit2Port_RA2M_um_fcst_000 failing with the error
Error message: Boundary data starts after start of current boundary data interval

and the Regn1_wPorto_RA2M_um_recon task fails (I don’t understand why.)

Grenville

No, Grenville. I don’t think you misunderstood the problem.
I just had another problem, and it is also a very familiar one…
I’ve been haunted by these problems and others…

But I didn’t get the problem with recon.

Masaru

Hi Masaru

FYI - we do have ANTS v13 working on Monsoon. It just requires a small change to the python_env file. Let me know if that’d be any use.

Grenville

Hi Grenville,

Thank you for letting me know! I tried running u-ci559 with ANTS turned on and ANCIL_ANTS_VEGFRAC failed.

Where is python_env file? I found ones in cylc-run folder for a couple of suites (like ~/cylc-run/u-ci463/bin/python_env) but not for this one. And how should I change it?

Masaru

Hi Masaru

u-ci559 is configured a little differently to the suite I was testing ANTS on. Please look at /home/d03/gmslis/roses/u-ci559, where I have made changes so that it uses ANTS to calculate Regn1_Brit2Port_ancil_ants_vegfrac.

To see what I added/changed:

gmslis@xcslc0:~/roses/u-ci559>  grep -r python_env *
app/ants_vegfrac/rose-app.conf:default=python_env ancil_lct.py ${source} \
site/monsoon-cray-xc40/python_env:# Usage python_env CMD_WITHOPTS
suite-runtime/ancils.rc:        init-script=mkdir -p ${CYLC_SUITE_DEF_PATH}/bin; cp -f ${CYLC_SUITE_DEF_PATH}/site/{{ SITE }}/python_env ${CYLC_SUITE_DEF_PATH}/bin/

To use ANTS for other ancillary files, you will need to copy what’s done in app/ants_vegfrac/rose-app.conf for other ancillary types.

Grenville

Hi Grenville,

Thank you for this.
ants_general_aero is the only other directory I can see in app/ants*. Should I change this line in app/ants_general_aero/rose-app.conf

default=${ANTS_PYTHON_PATH}/bin/python -s ${ANTS_PYTHON_PATH}/bin/ancil_general_regrid.py ${source} --target-grid ${target_grid} \

into this?

default=python_env ancil_general_regrid.py ${source} --target-grid ${target_grid} \

Masaru

Hi Masaru

That looks correct (I didn’t test that 'tho)

Grenville

Thanks, Grenville.

It looks like ANCIL and ANTS processes went fine, except that qrclim.sulpdms has value zero everywhere. This can be replaced with the file I created from the global data using python and xancil. So this is a progress.

But Regn1_Brit2Port_RA2M_um_fcst_000 fails with the same error as above…
INBOUNDA: Boundary data starts after start of current boundary data interval

Masaru

I got a sulpdms file - see /home/d03/gmslis/cylc-run/u-ci559/share/data/ancils/Regn1/Brit2Port/qrclim.sulpdms

It works fine for the larger nest. for wPorto it is all zero. But that’s not the major problem any more.

I forgot to set I_override_date_time back to 0 for the regional model. I think this was causing the INBOUNDA problem.
Now I have a different problem. I’ll ask you about this after I make some checks and tests.

Masaru

I was having the similar problem to the top of this page;

?  Error from routine: WGDOS Packing (f_shum_wgdos_pack)
?  Error message: Problem packing field...
?        STASH:  2380
?        Accuracy: -24
?        Minimum:    0.5540517937E-20
?        Maximum:    0.4970327279E+04
?        Message:  Unable to WGDOS pack to this accuracy

The difference is that I do request STASH item 2-380 and want to have this output.
But ‘Accuracy: -24’ sounds strange.

I set packing=0 in /home/d03/myosh/roses/u-ci960/app/glm_um/opt/rose-app-lbc.conf and packing=0 for pp2 used by upc through which I request item 2-380. I don’t know why but it seems to be running fine now.

u-ci960 ran fine for two 24 hour simulations. so I think it is mostly fine (except some minor issues).

u-ci985 is a copy of u-ci960 but I made both nests a little larger. (I initially tried including the third nest near SW of Britain but it wasn’t created correctly.)

In this suite Regn1_Brit2PortL_RA2M_um_fcst_000 fails with an error message “Convergence failure in BiCGstab after 1 iterations: omg is too small”. This is the same error message before (above) but that time it occurred for the global model.

I compared u-ci960 and u-ci985 and now I can’t see a difference that might cause the problem…

This problem does not go even if I run the suite as new.

Masaru