Problem submitting v8.4 GA4 UM-UKCA job to ARCHER2 from PUMA2

Dear NCAS CMS helpdesk team,

In August 2024 I ran the interactive volcanic aerosol configuration of GA4 UM-UKCA (Dhomse et al., 2020) for the main set of simulations for Hunga Tonga MIP organised by Univ Colorado.

There I was able to essentially re-run the model we ran for Pinatubo, El Chichon and Agung, with then also running the Hunga case equivalently.

The main simulations there were with nudging to ERA5 re-analysis, but there are 2 remaining simulations that the models are now running for this MIP, where they allow the volc-aerosol strat heating and emitted-H2O volcanic cooling effects consistently with the initial nudged simulations.

It’s a relatively straightforward re-run of the nudged case, and the plan is to basically re-run the Gregorian calendar case, to keep full alignment with the nudged runs, but with the nudging switched off, and from the same initialisations etc. as with the nudged Tonga-MIP runs.

It was ~18 months ago I ran these simulations (August 2024), and that was with PUMA2 fully integrated into ARCHER2, and then I’ve simply proceeded again the same process as last time, to follow the NCAS-CMS instructions at https://cms.ncas.ac.uk/puma2/umui for running the UMUI on PUMA2 for ARCHER2.

That worked just fine last time, including with being able to do FCM commit back to the trunk with the advice you gave back then.

As I say, this time it’s actually really simple, I just need to re-submit the same runs, but with the nudging switched off (retaining the Gregorian calendar and the same restart files etc.)

I’ve been able to open the UMUI OK, and create a new UMUI experiment and copy of previous job works fine (see I have copied my xphy-i and xpzg-x runs into the new UMUI experiment xqjh (keeping the same last character of the jobid in each case, i.e. xqjh-i and xqjh-x).

That all seemed to work fine, and I when I clicked “PROCESS” from the UMUI job, that seemed to proceed in the same was it always has, generating the files within the umui_jobs/${JOBID}/ directory.

The problem was when I then proceeded to run the UMSUBMIT_ARCHER2 script to do the “manual submit” method, where back in Aug24 that triggered the usual submission steps, to copy across to the umui_runs directory on ARCHER2.

This time when running ./UMSUBMIT_ARCHER2 it didn’t do anything at all (nothing happened).

I saw within the script it was ksh. and I tried ksh ./UMSUBMIT_ARCHER2, but had the same problem.

I also tried running “exec ssh-agent $SHELL”, prior to running the script, as I’ve had similar issue when connecting to JASMIN etc., that’s resolved with this precursor command to get the shell environment in a required form or so.

When I did that, there was a slightly different behaviour, it prompted me for the passphrase for my id_rsa_archer2 SSH key.

And then I’m guessing it might be something to do with this, the SSH connect back-across to ARCHER2 from PUMA2.

Just wondering if maybe there may have been a change in the protocols for this, back to ARCHER2 from PUMA2 that needs to be adjusted within the UMSUBMIT_ARCHER2 script?

Thanks a lot for any help or guidance you can provide here.

For info see below for the linux transcript of the commands, and you can see that the “process” stage has worked OK, and the UMSUBMIT_ARCHER2 and other required script files are all there OK in my ~/umui_jobs/${JOBID}/ directory.

But then when I’ve typed “./UMSUBMIT_ARCHER2” (or “ksh ./UMSUBMIT_ARCHER2”) it simply goes straight to the next prompt, and ignores the submission of the script.

You can see below, where I then also tried the “exec ssh-agent $SHELL”, and it then gives the prompt for the passphrase for connecting back to ARCHER2.

And then script must at least be starting to run, but I’m guessing it is simply hitting an “exit” or “done” command prior (or other exit command) before being able to rsync across the files in the way it usually does.

Best regards,

Cheers
Graham

Dr. Graham Mann
Lecturer in Atmospheric Science,
School of Earth & Environment
University of Leeds.

gmann@ln04:~> ssh -Y puma2
######################################################################################
---------------------------------Welcome to PUMA2-----------------------------------
######################################################################################
Enter passphrase for key '/home/n02/n02/gmann/.ssh/id_rsa_puma2': 
Register this system with Red Hat Insights: insights-client --register
Create an account or view all your systems at https://red.ht/insights-dashboard
Last login: Mon Feb 16 17:24:06 2026 from 172.24.75.54
[gmann@puma2 ~]$ umui&
[1] 3673979
[gmann@puma2 ~]$ ls -lrt
total 1792
-rwxr-xr-x.    1 gmann n02  28027 Jan  3  2012 STASHC_xguki_orig
-rwxr-xr-x.    1 gmann n02  21508 Nov 16  2012 bVOC_diffs.txt
-rwxr-xr-x.    1 gmann n02   2719 Jan 14  2013 fcm_spells_original.txt
-rwxr-xr-x.    1 gmann n02  25883 Aug 22  2013 differences.txt
drwxr-xr-x.    3 gmann n02   4096 Mar  6  2014 um_nesting
-rwxr-xr-x.    1 gmann n02  35968 Sep  9  2015 STASHC_xltpv_nohandedit
-rwxr-xr-x.    1 gmann n02    370 Nov 25  2016 items_to_remove_from_12ensSMURPHS.txt
-rwxr-xr-x.    1 gmann n02   4448 Dec 29  2016 fcm_spells_v103.txt
-rwxr-xr-x.    1 gmann n02    717 Dec 30  2016 version104_105_ifneeded.py
-rwxr-xr-x.    1 gmann n02    640 Dec 30  2016 version104_105_orig.py
drwxr-xr-x.    4 gmann n02   4096 Jan  5  2017 meta
-rwxr-xr-x.    1 gmann n02  12733 Jan 11  2017 Rose_reminder.txt
drwxr-xr-x.    2 gmann n02   4096 Apr 10  2017 test
-rwxr-xr-x.    1 gmann n02 623777 Jan 24  2019 ukca_mode_v84xlazb_24Jan2019.F90
-rwxr-xr-x.    1 gmann n02 619475 Jan 25  2019 ukca_mode_v84xnelp_25Jan2019.F90
-rwxr-xr-x.    1 gmann n02  16485 Jun 16  2019 ukca_option_mod_xolddBuiltCode_ForAshEmission.F90
-rwxr-xr-x.    1 gmann n02  31413 Jun 16  2019 ukca_mode_ems_um_mod_xolddBuiltCode_ForAshEmission.F90
-rwxr-xr-x.    1 gmann n02   1245 Aug  7  2020 bla.txt
drwxr-xr-x.    2 gmann n02   4096 Apr 19  2021 overrides
drwxr-xr-x.    3 gmann n02   4096 Apr 20  2021 um
drwxr-xr-x.    6 gmann n02   4096 Feb 21  2022 hand_edits_from_ukca_PUMAdir
drwxr-xr-x.    2 gmann n02   4096 Aug 28  2022 Code_Merging_v84_extendNudgingTropopauseFixBranch_forERA5nudgingL137option
-rwxr-xr-x.    1 gmann n02   3441 Oct 27  2022 fcm_spells.txt
-rwxr-xr-x.    1 gmann n02  13426 Oct 27  2022 ukca_volcanic_so2.F90
drwxr-xr-x.    3 gmann n02   4096 Nov 10  2023 bin
drwxr-xr-x.   77 gmann n02  16384 Jul  5  2024 FCM_7.1
drwxr-sr-x.    2 gmann n02   8192 Sep 28  2024 CodeMerging_AddH2OcoEmission_to_updGLOMAPtoDhomsev3withSchSO2r22850_and_ptbACIDPRUFRJ4volPPEupdr22849
drwxr-xr-x.    2 gmann n02 106496 Jan  2  2025 stashfiles
drwxr-xr-x.   29 gmann n02   4096 Oct 16 15:04 roses
drwxr-xr-x.    8 gmann n02   4096 Nov 21 15:02 FCM_MOSRS
drwxr-xr-x. 1262 gmann n02 139264 Feb 16 17:28 umui_jobs
[gmann@puma2 ~]$ cd umui_jobs
[gmann@puma2 ~]$ ls -lrt
drwxr-xr-x. 2 gmann n02   8192 May 26  2024 xpwvh
drwxr-xr-x. 2 gmann n02   8192 May 26  2024 xpwvi
drwxr-xr-x. 2 gmann n02   8192 May 26  2024 xpwvk
drwxr-xr-x. 2 gmann n02   8192 May 26  2024 xpwvl
-rw-r--r--. 1 gmann n02    436 May 28  2024 diff.xpwvh.xpwvk
-rw-r--r--. 1 gmann n02    436 May 28  2024 diff.xpwve.xpwvk
-rw-r--r--. 1 gmann n02  29068 Jul  4  2024 diff.xoldb.xoldy
-rw-r--r--. 1 gmann n02  36169 Jul  4  2024 diff.xovra.xovrc
-rw-r--r--. 1 gmann n02    837 Jul  8  2024 diff.xpvwu.xpvwv
drwxr-xr-x. 2 gmann n02   8192 Jul  8  2024 xpzga
-rw-r--r--. 1 gmann n02   1234 Jul 18  2024 diff.xpvwu.xpvwx
-rw-r--r--. 1 gmann n02    661 Jul 18  2024 diff.xpzgt.xpzgu
-rw-r--r--. 1 gmann n02    644 Jul 18  2024 diff.xpzgu.xpzgv
drwxr-xr-x. 2 gmann n02   8192 Jul 18  2024 xpzgt
drwxr-xr-x. 2 gmann n02   8192 Jul 18  2024 xpzgv
drwxr-xr-x. 2 gmann n02   8192 Jul 18  2024 xpzgu
-rw-r--r--. 1 gmann n02    839 Jul 18  2024 diff.xpvwv.xpvww
-rw-r--r--. 1 gmann n02    682 Aug 10  2024 diff.xpzgt.xpzgw
-rw-r--r--. 1 gmann n02   1689 Aug 10  2024 diff.xpvww.xpzgw
-rw-r--r--. 1 gmann n02    671 Aug 10  2024 diff.xpzgw.xpzgx
-rw-r--r--. 1 gmann n02    671 Aug 10  2024 diff.xpzgw.xpzgy
drwxr-xr-x. 2 gmann n02   8192 Aug 11  2024 xpzgw
drwxr-xr-x. 2 gmann n02  81920 Aug 14  2024 job_hist
drwxr-xr-x. 2 gmann n02   8192 Aug 14  2024 xpzgy
-rw-r--r--. 1 gmann n02    920 Aug 14  2024 diff.xpzgx.xpzgy
drwxr-xr-x. 2 gmann n02   8192 Dec 31  2024 xpzgx
-rw-r--r--. 1 gmann n02  38760 Jan 18  2025 xpvwv.A.diags
drwxr-xr-x. 2 gmann n02   4096 Feb 16 16:31 xqjhi
-rw-r--r--. 1 gmann n02  51022 Feb 16 16:32 diff.xpzgx.xqjhi
drwxr-xr-x. 2 gmann n02   4096 Feb 16 17:28 xqjhx
[gmann@puma2 umui_jobs]$ cd xqjhx
[gmann@puma2 xqjhx]$ ls -lrt
total 388
-rw-r--r--. 1 gmann n02      0 Feb 16 17:28 USR_MACH_OVRDS
-rw-r--r--. 1 gmann n02     87 Feb 16 17:28 USR_FILE_OVRDS
-rw-r--r--. 1 gmann n02     43 Feb 16 17:28 USR_PATHS_OVRDS
-rw-r--r--. 1 gmann n02    175 Feb 16 17:28 UAFLDS_A
-rw-r--r--. 1 gmann n02    156 Feb 16 17:28 UAFILES_A
-rw-r--r--. 1 gmann n02   7553 Feb 16 17:28 CNTLALL
-rw-r--r--. 1 gmann n02 108830 Feb 16 17:28 PRESM_A
-rw-r--r--. 1 gmann n02     99 Feb 16 17:28 INITHIS
-rw-r--r--. 1 gmann n02     77 Feb 16 17:28 IOSCNTL
-rw-r--r--. 1 gmann n02   7723 Feb 16 17:28 RECONA
-rw-r--r--. 1 gmann n02    120 Feb 16 17:28 PPCNTL
-rw-r--r--. 1 gmann n02  13199 Feb 16 17:28 SUBMIT
-rw-r--r--. 1 gmann n02   8712 Feb 16 17:28 SCRIPT
-rw-r--r--. 1 gmann n02   5761 Feb 16 17:28 FCM_UMSCRIPTS_CFG
-rw-r--r--. 1 gmann n02   7547 Feb 16 17:28 FCM_UMRECON_CFG
-rw-r--r--. 1 gmann n02   8288 Feb 16 17:28 FCM_UMATMOS_CFG
-rwxr-xr-x. 1 gmann n02   4194 Feb 16 17:28 EXTR_SCR
-rwxr-xr-x. 1 gmann n02    289 Feb 16 17:28 COMP_SWITCHES
-rw-r--r--. 1 gmann n02   3427 Feb 16 17:28 FCM_BLD_COMMAND
-rwxr-xr-x. 1 gmann n02   2983 Feb 16 17:28 MAIN_SCR
-rwxr-xr-x. 1 gmann n02   6373 Feb 16 17:28 UMSUBMIT
-rwxr-xr-x. 1 gmann n02   4193 Feb 16 17:28 UMSUBMIT_ARCHER2
-rw-r--r--. 1 gmann n02    714 Feb 16 17:28 CNTLGEN
-rw-r--r--. 1 gmann n02   8028 Feb 16 17:28 SHARED
-rw-r--r--. 1 gmann n02   4484 Feb 16 17:28 INITFILEENV
-rw-r--r--. 1 gmann n02   2690 Feb 16 17:28 SIZES
-rw-r--r--. 1 gmann n02  21975 Feb 16 17:28 CNTLATM
-rw-r--r--. 1 gmann n02  78218 Feb 16 17:28 STASHC
-rw-r--r--. 1 gmann n02   5468 Feb 16 17:28 EXT_SCRIPT_LOG
[gmann@puma2 xqjhx]$ ./UMSUBMIT_ARCHER2
[gmann@puma2 xqjhx]$ exec ssh-agent $SHELL
bash-4.4$ ./UMSUBMIT_ARCHER2
Enter passphrase for key '/home/n02/n02/gmann/.ssh/id_rsa_archer2': 
bash-4.4$ pwd
/home/n02/n02/gmann/umui_jobs/xqjhx
bash-4.4$ 

I did also just try doing a ssh-add for the id_rsa_archer2 SSH key, and then run the script.
But it does the same thing, then identically to when doing the ./UMSUBMIT_ARCHER2 from the usual shell

bash-4.4$ ssh-add /home/n02/n02/gmann/.ssh/id_rsa_archer2
Enter passphrase for /home/n02/n02/gmann/.ssh/id_rsa_archer2:
Identity added: /home/n02/n02/gmann/.ssh/id_rsa_archer2 (/home/n02/n02/gmann/.ssh/id_rsa_archer2)
bash-4.4$ ./UMSUBMIT_ARCHER2
.

Just to add also, I checked the ssh keys are set-up and working correctly, with the “rose host-select archer2” test command giving “ln02” indicating passwordless ssh back to ARCHER2 is set-up OK on PUMA2.

I also saw the info on the NCAS-CMS training that on ARCHER2 the PUMA2 home directories are now visible from cross-linked “/home/n02/n02-puma/${USER}” directory.

And then I also tried running the “./UMSUBMIT_ARCHER2” script directly on ARCHER2, from my /home/n02/n02-puma/gmann/ directory.

But then that didn’t work either as it was looking for the umui_jobs directory within the “/home/n02/n02/gmann/” directory.

I was just thinking I guess maybe it makes sense to do the manual submit actually from ARCHER2, now that the filesystem on PUMA2 is cross-mounted onto ARCHER2.

And then potentially modifying the script to operate that 1st stage from /home/n02/n02-puma/${USER}/umui_jobs/
(or wondering if maybe it should in any case realise it’s in that directory from having the pwd in the script?).

Anyway, just thought I’d send this update here.

The query seems to be greyed-out within the main screen, and I’l not sure why that is
(I may not have set up the initial query correctly).

And then I’ll try emailing directly tomorrow evening if not heard at that time
(in case it doesn’t get seen, with the settings on the CMS query post).

Hi Graham,

It’s because you have the job set to submit to ln01 and that login node is out of circulation for maintenance. Change the host to one of the other login nodes and try again.

You can add set -x at the beginning of the UMSUBMIT_ARCHER2 to see where the script is falling over.

Cheers,
Ros

Hi Ros,

Ah, OK. Thanks for this.

I edited the UMSUBMIT_ARCHER2 file on PUMA changing “host = ln01” instead to “host=ln02”.
And I also added “set -vx” after the comment headings at the start of the script, as you suggested.

When I did the ./UMSUBMIT_ARCHER2 to run the script, it then successfully copied-across the files
to ARCHER2, and created the xqjhx-049004554 drectory on ARCHER2 in the umuisubmit_runs dir.

However, the UMSUBMIT_ARCHER2 script is giving an FCM extract error for the UMATMOS base repo:

MAIN_SCR: Calling Extract ...
Extracting UMATMOS base repository...
UMATMOS base repository extract failed
See extract output file /home/n02/n02/gmann/um/um_extracts/xqjhx/baserepos/UMATMOS/ext.out
MAIN_SCR: Extract failed
MAIN_SCR stopped with return code 255
+ CC=255
exit $CC

+ exit 255

I checked the ext.out file, as it suggests, and see the error there seems to be occurring
when it’s trying to run the mkdir command for the um/xqjhx/baserepos/UMATMOS/cfg directory.

Please can you advise, does this signify a particular issue here with the FCM extract?
Or is it simply permissions or disk-space issue within my /work/n02/n02/gmann/ area on ARCHER2?

Thanks a lot for your help with this,

Best regards,

Cheers
Graham

—-

Column 1: Destination status summary:
No of files [A] added: 2988
Column 2: Source status summary:
No of files [B] from the base: 2988
Generated cfg: /home/n02/n02/gmann/um/um_extracts/xqjhx/baserepos/UMATMOS/.cache/.ext/.config_file_src
Generated ext cfg: /home/n02/n02/gmann/um/um_extracts/xqjhx/baserepos/UMATMOS/cfg/ext.cfg
Generated bld cfg: /home/n02/n02/gmann/um/um_extracts/xqjhx/baserepos/UMATMOS/cfg/bld.cfg
->Extract: 51 seconds
->Mirror: start
Destination: ln01:/work/n02/n02/gmann/um/xqjhx/baserepos/UMATMOS

Start: 2026-02-18 00:46:54=> ssh -n -oBatchMode=yes ln01 mkdir -p /work/n02/n02/gmann/um/xqjhx/baserepos/UMATMOS/cfg

Connection closed by 172.24.75.51 port 22^M

Time taken: 0 s=> ssh -n -oBatchMode=yes ln01 mkdir -p /work/n02/n02/gmann/um/xqjhx/baserepos/UMATMOS/cfg

[FAIL] ssh -n -oBatchMode=yes ln01 mkdir -p /work/n02/n02/gmann/um/xqjhx/baserepos/UMATMOS/cfg failed (255) at /home4/home/n02-puma/fcm/metomi/fcm-2021.05.0/bin/../lib/FCM1/Dest.pm line 755.

I wondered if that was just “as far as it gets to” from this initial manual submit on PUMA2.

After tyoing in the above, I realised I’d only changed “host=ln01” to “host=ln02” within the UMSUBMIT_ARCHER2 script file.

By doing a grep on the chararcter-string “ln01” I realised there are also separate instances of “host=ln01” also within the SUBMIT, MAIN_SCR and EXTR_SCR.

And then once I edited those in the same way, changing “host=ln01” to “host=ln02” also in these additional 3 script files, the FCM extract then works fine.

+ echo 'Calling MAIN_SCR - local...'
Calling MAIN_SCR - local...
+ echo '(This may take several minutes)'
(This may take several minutes)
+ /home/n02/n02/gmann/umui_jobs/xqjhx/MAIN_SCR 049011951

MAIN_SCR: Calling Extract ...
Extracting UMATMOS base repository...
UMATMOS base repository extract is OK
Extracting JULES base repository...
JULES base repository extract is OK
created umscripts sub-directory.
Extracting UMSCRIPTS including any branches...
UMSCRIPTS extract is OK
created umatmos sub-directory.
Extracting UMATMOS including any branches...
UMATMOS extract is OK
created umrecon sub-directory.
Extracting UMRECON including any branches...
UMRECON extract is OK
MAIN_SCR: Extract OK
Submitted batch job 12535418
MAIN_SCR: Submit OK
+ CC=0
exit $CC

+ exit 0