Nudging run missing stash

Hi,

I keep getting this error relating to the missing STASH field 30451:

Pressure at Tropopause Level needs to be requested in STASH with TALLTS profile

???
???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
? Error code: 451
? Error from routine: NUDGING_MAIN
? Error message: Required Diagnostic missing from STASH
? Error from processor: 0
? Error number: 53
???

I am confused as this was working previously and I have it being output on UP4 and UPM as TMONMN so thought this would be ok?

Thanks,

Hannah

Hi Hannah,

As far as this error message is concerned, the nudging code checks on the 1st timestep whether this field is active in STASH, and requesting with TALLTS ensures that.
I cannot say at this stage why requesting with TMONMN does not work n this case and worked earlier, unless there has been a change in the way ‘sampling’ is specified (should be something like ‘sample-every-timestep-and-average-over-one-month’).

Also check in pe_output or job.err if there are any warnings from PRELIM about this diagnostic.

Mohit

Hi Mohit,

No changes to those samplings! I had turned off one of the output streams and then turned it back on again (and recommitted) but cannot get it back working. There is nothing in the PRELIM saying that the STASH is denied etc. is it reliant on any other diags?

Thanks,

Hannah

Hi Hannah,

No it is not dependent on any other diagnostic or setting. If the output stream has been turned Off previously maybe STASH still thinks it is not needed and hence deactivates all diagnostics to that pp stream.

If the 30-451 diagnostic is not needed in the output you can change the request to use TALLTS and UPUKCA.

Mohit

Hi Mohit,

Sorry that was not clear from me, I meant one of the output streams for this diagnostics (i.e. the UP4 one). I’ll try changing to UPUKCA as I do not need to copies of it.

I was reading the documentation on nudged UKESM1-AMIP and noticed the recommendation to stop using TMONMN and to use TDMPMN and UPMEAN output? Is this needed? I inherited this job from a previous project and it uses TMONMN still.

Thanks,

Hannah

The Nudged UKESM1-AMIP documentation is for UKESM1.0 (11.x) where climate meaning for Gregorian calendar was not working properly. Most of the issues have been fixed for UKESM1.1 (vn13.x) which is most probably the suite you have inherited.

Mohit

Hi Mohit,

My suite is vn12.0 so a little older! The parent job was u-dk793.

Hannah

Hi Hannah,

As long as the ‘branches/dev/mohitdalvi/vn12.0_gregorian_fixes’ branch from u-dk793 has not been removed then the meaning should work fine. This gathers the relevant fixes as they were being applied to the UM, up to vn13.x.

Mohit

Hi Mohit,

Yes that branch is applied! So all monthly should go to UPM? (sorry just want to make sure this is all correct!)

Thanks,

Hannah

Yes, that should be working fine, supposing the predecessor suite had been working as expected.

Hi Mohit,

The UPUKCA change allowed the model to run for three months but then it failed with the error:

???

[0] ???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!

[0] ? Error code: 5

[0] ? Error from routine: READFLDS_CORE

[0] ? Error message: Readflds Address error for field 40231:

[0] ? Calculated (address)/(field_length): (2712621)/(48)

[0] ? Maximum D1 address: 2712620

[0] ? Error from processor: 0

[0] ? Error number: 52

[0] ???

I’m struggling to decipher this as 40231 is not a STASH code but a field code.

Thanks,

Hannah

Hi Hannah,

That sounds like either a corrupted file/ disk issues , or a corrupted diagnostic.

Have you been able to run a copy of the predecessor suite without any changes? This is useful especially if there are STASH/ output stream changes proposed.

Also, could you let me know the suite and location of the runs (ARCHER/ Monsoon, user-id), so I can try to take a look ?

Mohit

Hi Mohit,

Yeah I have been able to run it for a month but this is the first time I have tried to run longer! The job id is u-dq440 and I am running on Archer2 (should be in my work folder /work/n02/n02/s2261584)

Thanks!

Hannah

From looking at the restart dump header, the field 40231 appears to be level 28 of 52-228 Photolysis Rate JO1D on PRES LEVS.

Unfortunately, xconv does not list any Pressure level fields in the dump or the monthly mean file, so cannot check if this field itself is corrupted.

Hi Mohit,

I have checked both the job and the branch and did not make any changes to 52-228. The last cycle (20010101T0000Z) failed on the pptransfer with the error [ERROR] Archive directory /work/n02/n02/s2261584/cylc-run/u-dq440/share/cycle/20010101T0000Z/u-dq440/20010101T0000Z doesn’t exist. This is strange as this is three months of the cycle (I was getting this previously when I had it set to 1M so increased the cycling frequency). Could this be what is causing the error in the diagnositc?

Sorry I am very confused I haven’t run into any errors like this before!

Thanks,

Hannah

Hi Hannah,

As you can see the path in the error message seems to be constructed incorrectly: u-dq440/share/cycle/20010101T0000Z/u-dq440/20010101T0000Z

Also, Gregorian calendar runs can only use 1-month segments as STASH cannot support climate meaning for longer periods (yet - likely to be fixed at vn14.0).

So, it is worth re-setting RESUB to 1M and trying to address this error first. If you are re-running the suite make sure to empty share/data/History_Data/ to avoid conflict with earlier files.

Mohit

Hi Hannah,

You can ignore the Archive directory error in pptransfer for the 20010101T0000Z cycle. This is because it’s the first cycle and there is no data that is finished with ready to be archived. Please just set the pptransfer task to succeeded.

The /work/n02/n02/s2261584/cylc-run/u-dq440/share/cycle/20010101T0000Z/u-dq440/20010101T0000Z directory naming is fine. This is just a result of how the postproc app is set up to archive to the $ROSE_DATAC directory. This is fixed at a later postproc release but doesn’t impact the running of the suite.

Cheers,
Ros.

Hi Ros, Mohit,

Thank you both. I think previously I was getting this to work and the only thing I am slightly confused as to why I have changed is in the app/housekeeping/rose-app.conf file I added the line “prune{share/cycle}=-P3M” following the advice given in this ticket: Switch off ARCHER2 automatical archiving - #2 by RosalynHatcher

Could this be causing any of these errors?

Thanks,

Hannah

Hi Hannah,

All the prune{share/cycle}=-P3M does is delete the staged archive data directory from the /work/n02/n02/s2261584/cylc-run/u-dq440/share/cycle directory once it has been successfully transferred to JASMIN.

Regards,
Ros.

Hi Ros,

Ok great thank you. It has now run a full year successfully! I can’t see any data in the /work/n02/n02/s2261584/archive/u-dq440 file at all however (it is all on Jasmin). Without the restart file here I’m not sure how the job will keep running? Does the prune also delete this data?

Thanks,

Hannah