Upgrading from cylc7 to 8 on Monsoon3

Hi Ros,

Sorry to bother you again, but now that the workflow has been upgraded, when I try to look at it (using rose config-edit) I get:

And all of the section names within the GUI are messed up.

Are you able to point me to the equivalent meta data?

Charlie

Hi Charlie,

I’ve emailed Jeremy to see if he has that directory on Monsoon3 (that’s a monsoon2 path). I can’t see into his home directory to check. If he hasn’t then you’ll need to check it out from the moci MOSRS repository yourself.

Cheers,
Ros

Thanks very much Ros.

Firstly, can you remind me how I do that i.e. check out the moci repository?

Secondly, despite the meta data not being quite right, I tried running the workflow anyway (using cylc vip). The good news is that it did at least begin running. The bad news is that it seems to have failed at the fcm_make2_ocean task (fcm_make_ocean seems to have succeeded). Meaning it didn’t even get to the nemo_cice task, which is the one I really care about. But I don’t know how, using the new system, to check where the error lies.

On that note: is there any documentation for the new system, over and above what is at Summary Of Major Changes — Cylc 8.6.3 documentation ? Because, dare I say it, it (and the Cylc 8 Migration Guide in general) is not particularly user friendly, at least not for a dummy like me. It assumes a lot of background knowledge. Looking at the new GUI (see below), I can see that purple now = submit failed and that green = succeeded. But where are the various error and log files? And I also note that the structure of the output has changed, such that there is now a run1 and run2. What do these mean? Intuitively, are these literally the number of times I have run this workflow e.g. run1 was my attempt several weeks ago before I had upgraded, and run2 is what I did yesterday? Certainly ~/cylc-run/u-dg710/run1 appears to contain empty directories. Likewise, ~/cylc-run/u-dg710/share and /work now appears to be ~/cylc-run/u-dg710/run2/share and /work. Is that correct or am I looking in the wrong place?

Sorry for the questions.

Charlie

Hi Charlie,

Jeremy is currently out of the office and I’ve had a quick look in the moci repos and it isn’t his branch so will need to do a bit more investigation to track it down or wait for Jeremy to come back. Not having the metadata doesn’t affect the running of the job in any way, just makes the GUI not so nice.

You can see what all the coloured and grey blobs mean here:

In the cylc GUI if you click on the purple square a menu will pop up with an option to take you to the log files within the GUI.

Alternatively you can find then on PUMA2/ARCHER2 under ~/cylc-run/<suiteid>/<runX>/logs
More details here:

Yes with cylc 8 everytime you run the workflow with cylc vip <id> it will start a completely new run in a new run[1,2,…] directory.

Cheers,
Ros

Okay, thank you very much for the extra information.

I have now checked the log files, and it would appear (as I suspected) that the fcm_make2_ocean task has failed at the submission stage:

2026-05-18T14:40:43Z [STDERR] qsub: error: [PBSSitePolicy] export of complete environment with -V is no longer supported
[((‘event-handler-00’, ‘submission failed’), 1) cmd] rose suite-hook ‘submission failed’ ‘u-dg710/run2’ ‘19760101T0000Z/fcm_make2_ocean’ ‘job submission failed’
[((‘event-handler-00’, ‘submission failed’), 1) ret_code] 1
[((‘event-handler-00’, ‘submission failed’), 1) err] Command obsolete, use Cylc event handlers

So I’m guessing this is almost certainly something I did wrong in my new flow.cylc? It has to be, because this workflow ran perfectly well back in November, before we switched to Monsoon3.

Charlie

Hi Charlie,

In [[HPC_UM]] remove the -V =. This option to PBS is not supported on Monsoon3.

Cheers,
Ros

Hi Charlie,

I’ve also found the metadata branch. You can check it out from the moci repository with:

fcm co fcm:moci.x-br/dev/ericaneininger/r177_metatdata_coupled_apps

Then edit app/nemo_cice/rose-app.conf to point your suite at it.

Cheers,
Ros

Sorry Ros, I have tried doing that, i.e. I changed the first line in my ~/roses/u-dg710/app/nemo_cice/rose-app.conf to

meta=/home/users/charles.williams.ext/r177_metatdata_coupled_apps

(I wasn’t sure if it needed to be checked out to any particular place, but my home directory seemed as good as any), but I get the same error - see below. Worse, now the rose config-edit won’t even open the GUI, it just hangs with the below. I definitely saved the file, so why is it still looking for Jeremy’s path?

Charlie

Further to this… Sorry, the GUI does open, but only when I close down the error box. And the meta data is still not right, once it is open (unsurprisingly, as it is still looking for the wrong version rather than mine).

Charlie

And sorry, I realise we are dealing with 2 separate threads here, but in the meantime I tried running it again after making that change (i.e. removing -V), but same error message as before.

[jobs-submit cmd] ssh -oBatchMode=yes -oConnectTimeout=8 -oStrictHostKeyChecking=no login12 env CYLC_VERSION=8.6.3 CYLC_ENV_NAME=cylc-8.6.3-2 bash --login -c ‘’“'”‘exec “$0” “$@”’“'”‘’ cylc jobs-submit --utc-mode --remote-mode --path=/bin --path=/usr/bin --path=/usr/local/bin --path=/sbin --path=/usr/sbin --path=/usr/local/sbin – ‘$HOME/cylc-run/u-dg710/run1/log/job’ 19760101T0000Z/fcm_make2_ocean/01
[jobs-submit ret_code] 159
[jobs-submit out] 2026-05-20T15:07:38Z|19760101T0000Z/fcm_make2_ocean/01|159|None
2026-05-20T15:07:38Z [STDERR] qsub: Unauthorized Request
[((‘event-handler-00’, ‘submission failed’), 1) cmd] rose suite-hook ‘submission failed’ ‘u-dg710/run1’ ‘19760101T0000Z/fcm_make2_ocean’ ‘job submission failed’
[((‘event-handler-00’, ‘submission failed’), 1) ret_code] 1
[((‘event-handler-00’, ‘submission failed’), 1) err] Command obsolete, use Cylc event handlers

Given that both of these problems seem to be similar i.e. it is not picking up a change I have just made, have I missed something silly e.g. I need to install it or validate it again before running? So cylc install or cylc validate, before cylc vip?

Charlie

Ok. We’ll go with one thing at a time.

Using grep we can find more occurrences that need changing.

cazccylc1> pwd
/home/users/charles.williams.ext/roses/u-dg710
cazccylc1> grep -r r177_metatdata *
app/nemo_cice/meta/rose-meta.conf:import=/home/d04/jwalton/repository/moci/r177_metatdata_coupled_apps/rose-meta/ocean_ice/cice/v5.1.2_GSI8
app/nemo_cice/meta/rose-meta.conf: =/home/d04/jwalton/repository/moci/r177_metatdata_coupled_apps/rose-meta/ocean_ice/nemo/v3.6_GO6_CO6
app/nemo_cice/rose-app.conf:meta=/home/users/charles.williams.ext/r177_metatdata_coupled_apps

Following that there is another metadata branch you’ll need for the fcm_make_ocean app if you want that to render nicely. Up to you. I think it’s probably fcm:moci.x-br/dev/timgraham/r209_GO6_fcm_configs

Hi Charlie

The PBS submission issue is due to using the wrong queue names.
See https://code.metoffice.gov.uk/doc/monsoon3/pbs.html for all the valid Monsoon3 queues.

In flow.cylc:
In the [[*make2*]] tasks change the -q = shared queue to -q = collabshared
and
in the [[nemo_cice]] task change -q = normal to -q = collab

That should get you at least to submit successfully.

Cheers,
Ros

Hi Ros,

Okay, I think we are finally getting somewhere.

Firstly, I have changed all of those directories for the metadata, and have checked out the other one you mentioned. When I open the workflow GUI, I don’t get any errors. It still looks different to how it used to, however; for example, what used to be a bunch of switches (in suite conf > Build and run) is now a bunch of entries (in suite conf > template variables). Is this just something we have to live with?

Secondly, having changed the various queue names, fcm_make2_ocean has now at least submitted. But then instantly failed (not a submit-fail this time, but a proper fail), saying that it could not find the module “XiOS-PrgEnv/1.0”. Where is this particular module specified, and what is the Monsoon3 equivalent?

Charlie

Hi Charlie,

I’m not sure where that path has moved to. I’m just checking with the Monsoon team.

Cheers,
Ros

Thanks very much Ros, any news?

Just so you are fully aware: it might be that this particular module isn’t actually needed for my purposes. This particular workflow is not like a standard simulation for my purposes, rather it is a way of making the mesh_mask files for any given bathymetry. In other words, it is a stand-alone NEMO workflow, where I give it a bathymetry I have made. It then runs for one timestep only, before immediately crashing. But this is fine, because I don’t need it to run any longer. One timestep is enough to create the 192 mesh_mask files, which is all I need. I was told there is a better way of doing this, using domain.cfg, but I have never tried this and don’t really know where to start.

But obviously in order to get the nemo_cice task to run at all, we need to build the others. Might there be a way/fudge around this, if we cannot locate that module?

Charlie

Hi Charlie,

The module will be needed in order to build the nemo_cice executable.

I’ve found the equivalent of the Monsoon 2 path /common/moci/modules/modules which is.
/projects/metoff/moci.mon/modules/modules on Monsoon3, but that module doesn’t exist. Not surprisingly because it is pretty old.

I’ll get back to you when I’ve had more of a dig around.

Cheers,
Ros

Oh, why can it never be simple? Very many apologies, really hope it is not too difficult to find.

Charlie

Hi Charlie,

So looking at CMIP6 suite on ARCHER2 which uses the same NEMO version as yours (NEMO 3.6) I can see what XIOS version we need for it which is definitely not on Monsoon3 as it is ancient. Simon is currently in the process of getting access to Monsoon3 to do some other NEMO work and has offered to look at making a compatible XIOS module for you. Leave it with us and we’ll get back to you when we have an update. In the meantime if Robin comes back with a better and easier solution do let us know.

Cheers,
Ros

Thank you very much indeed, massively appreciated as always. And sorry this is yet again proving to be way more complicated than it should be. Why did the Met Office not just port everything over from Monsoon2 to Monsoon3, rather than leaving certain things behind on the incorrect assumption that nobody used them? Sorry, I appreciate this is not your fault. I’m just a bit frustrated that a methodology that worked completely fine on Monsoon2 is now completely different, and almost unusable, just because of a system upgrade. And that’s not even talking about the migration from cylc7 to 8, which has caused this entire headache. I am a firm believer in the phrase “if it ain’t broke, don’t fix it”!

Charlie

Hi Charlie,

I’ve had a look on Monsson3 and cannot find XIOS-PrgEnv/1.0 on the system, so I’ve tried to build an equivalent myself, using the PrgEnvs on ARCHER2 as a base. However XIOS-PrgEnv/1.0 doesn’t exist on ARCHER2 either, so I’ve used GC3-PrgEnv/v1 as a base instead. To use this you’ll need to set MOCI_MODULE_PATH to /data/users/simon.wilson.ext/moci/modules/modules and GC3_MODULE_NAME to GC3-PrgEnv/v1 in the suite conf->template variable section of the rose GUI.

Simon.