Postproc MASS put issues: umpp

Hello, I’ve been trying to run some simulations on monsoon3 for a few days now and keep getting the same errors. I have run this suite before with no issues - i’m re-running it with very slight changes to a science namelist - postproc settings are unchanged.

It runs fine but the postproc step fails when trying to write to MASS. It is inconsistent and may get 9 months in then stop. The error message I get in the postproc_archive log.err is:

put: option local-file-format failed: No known converter or specification file on your $PATH corresponding to: umpp

suite is u-du630

Thanks, Ross

I am also getting a moo get issue - with ‘unable to prepare a single copy…’.

Are suite outputs being moved around MASS at the moment to give that error? Or are there general issues with MASS that would be messing with my moo put and moo get commands?

this is the exchange I had with Mohit on teams

Mohit →

  • ‘get’ errors: The SINGLE_COPY_UNAVAILABLE is an expected error indicating that the tape with your data has been physically removed from MASS to upload to new MASS. It takes about 10 days to complete the process so there is no option but to wait. As new MASS catches up on the archived data more recent data/tapes will be removed for migration.
  • ‘put’ errors: It looks like the archiving has got itself muddled up after a failure and re-triggering. The first ‘postproc_archive’ task for 201509 failed as the command for archiving ‘p22015.aug’ was being duplicated. However, it may have been successful subsequently in the background, and all .pp files deleted. Then a subsequent re-triggering, possibly without repeating postproc_transfer first meant that moose was trying to archive the fields files with ‘on-the’fly’ pp conversion ( with -c=umpp option that has been removed).
    I am not sure what to suggest here, unless you want to try converting and archiving any remaining files upto September manually.

Ross → Thanks Mohit. Re the put errors - this has happened on about 5 to 10 attempts including a cylc clean of the workspace. If my atmos_main task for the next month is completing before the archiving has finished from the previous month - could that be causing the duplicated archiving command? May explain why it’s not always the same month that it starts to trip up?

Mohit → Could it be the case that due to Gregorian calendar and frequency of re-initialisation there are files with identical names being created in certain months? Files with sub-monthly re-initialisation should ideally not have the 3-letter month (aug, sep) in the name but instead the YYYYMMDD format preferably which reflects the file creation date (so 20150901, 20150911 and so on). See file naming convention: https://code.metoffice.gov.uk/doc/um/latest/papers/umdp_007.pdf I believe using ‘p%C’ in the filename_base (nlstcall_pp/ output_streams namelist) should automatically lead to the appropriate file naming.

Ross → no, it ran fine before

Mohit → Ah ok, it could just be that the put command does not complete in time (at the server end) and the subsequent re-triggers cause it to be muddled up.

Ross → I might try extending the cycling freq from 1 month to 3 months - however, I’m pretty sure I remember someone recently mentioning that it needs to be 1 month for some reason on nudged configurations. Is this true?

Mohit → Yes, that is a current limitation of the STASH processing with Gregorian calendar