I’m trying to run a GAL9 job using vn13.7 on monsoon and I’m running into issues with fcm_make_pp job.
[FAIL] mirror.target = : incorrect value in declaration
[FAIL] config-file=/scratch/d00/rwaters/cylc-run/u-dm191/run1/work/19880901T0000Z/fcm_make_pp/fcm-make.cfg:4
[FAIL] config-file= - file:///home/d04/fcm/srv/svn/moci.xm/main/trunk/Postprocessing/fcm_make/postproc.cfg@4419:12
[FAIL] config-file= - - file:///home/d04/fcm/srv/svn/moci.xm/main/trunk/Postprocessing/fcm_make/inc/remote.cfg@4419:6
[FAIL] fcm make -f /scratch/d00/rwaters/cylc-run/u-dm191/run1/work/19880901T0000Z/fcm_make_pp/fcm-make.cfg -C /home/d00/rwaters/cylc-run/u-dm191/run1/share/fcm_make_pp -j 4 # return-code=9
2025-01-07T10:07:05Z CRITICAL - failed/ERR
Here is the ./app/fcm_make_pp/rose-app.conf
meta=archive_and_meaning/fcm_make/postproc_2.4
[env]
config_base=fcm:moci.xm_tr
config_rev=@postproc_2.4
extract=extract
install=build
install_host=remote.cfg
model_config=um-atmos.cfg
pp_rev=postproc_2.4
pp_sources=
and the ./app/fcm_make_pp/file/fcm-make.cfg
$config_base{?}=fcm:moci.xm_tr
$config_rev{?}=
include = $config_base/Postprocessing/fcm_make/postproc.cfg$config_rev
extract.location{diff}[pp] = $pp_sources
For reference the suite is u-dm191.
Any help would be appreciated.
Thanks in advance.
Hi Rob,
It looks like the run copy (~/cylc-run/) of u-dm191 has been removed, so am unable to check the runtime settings. However, I suspect the issue is at this line in Postprocessing/fcm-make/inc/remote.cfg
mirror.target = ${ROSE_TASK_MIRROR_TARGET}
and cylc-8 might not be exporting this value (will have to be verified in the actual jobscript)
I am not sure what the value of this variable is in Cylc-8, but will check in cylc-7 suites and see if exporting this explicitly in the [environment] settings can help,
Just to add- I doubt that the original developer of GALx suites has access to Monsoon so even though the option is avaiiable the suites are not regularly tested on other systems and rely on users to feedback any portability changes.
Hi Mohit,
Thanks for your help.
I’ve just re-run the suite so there should be a cylc-run contents available now.
Where exactly is the runtime setting I’m looking for?
I didn’t realise that the GAL9 team didn’t have access to Monsoon. I’ve been providing feedback to Paul Earnshaw as I’ve had to make a few other changes to get it working on Monsoon.
Thanks again,
Rob
Hi Rob,
The runtime settings (and environment inherited by job) can be seen ~/cylc-run/suite-id/runX/log/job/date-time/task-name/job and job.out. In this case:
~rwaters/cylc-run/u-dm191/run1/log/job/19880901T0000Z/fcm_make_pp/NN/job.out
and there is no ROSE_TASK_MIRROR_TARGET setting exported.
Surprisingly, this setting is exported in cylc-8 suites I have on our internal HPC, so not sure what is going on here. Note that Monsoon set-up is unique that ‘launch’ and ‘execute’ systems are the same (whereas e.g. we have Puma for launch and execution on ARCHER2), so I wonder if there are additional settings required here.
In site/monsoon.cylc, under the EXTRACT_RESOURCE block can you try replacing the
platform = xcsc
with (my Monsoon cylc-7 definition)
[[[remote]]]
host = $ROSE_ORIG_HOST
Thanks
Ah interesting, I removed the use of $ROSE_ORIG_HOST because I was getting undefined platform.
When I use your suggestion I first get the following warning:
WARNING - deprecated settings found (please replace with [runtime][EXTRACT_RESOURCE]platform):
[runtime][EXTRACT_RESOURCE][remote]host = $ROSE_ORIG_HOST
Then cylc fails to play the workflow
message: A mixture of Cylc 7 (host) and Cylc 8 (platform) logic should not be used. In this case for the task "19880901T0000Z/fcm_make_pp" the following are not compatible:
workflow: u-dm191/run1
host: xcslc0
port: 43132
owner: rwaters
If I put $ROSE_ORIG_HOST back as the platform (what it was originally). The job fails to submit with the following message:
rwaters@xcslc0:~/cylc-run/u-dm191/run2/log/job/19880901T0000Z/fcm_make_pp/01> cat job-activity.log
[jobs-submit cmd] (platform not defined)
[jobs-submit ret_code] 1
[jobs-submit err] No matching platform "xcslc0" found
[(('event-mail', 'submission failed'), 1) ret_code] 0
Because the only available platforms are:
rwaters@xcslc0:~/cylc-run/u-dm191/run2/log/job/19880901T0000Z/fcm_make_pp/01> cylc config --platforms
[platforms]
[[localhost]]
install target = localhost
ssh command = ssh -oBatchMode=yes -oConnectTimeout=8 -oStrictHostKeyChecking=no
copyable environment variables = FCM_VERSION
submission polling intervals = PT30M
execution polling intervals = PT30M
execution time limit polling intervals = PT5M, PT10M
clean job submission environment = True
[[xcsc]]
install target = localhost
ssh command = ssh -oBatchMode=yes -oConnectTimeout=8 -oStrictHostKeyChecking=no
copyable environment variables = FCM_VERSION
submission polling intervals = PT30M
execution polling intervals = PT30M
execution time limit polling intervals = PT5M, PT10M
clean job submission environment = False
hosts = localhost
job runner = pbs
err tailer = qcat -f -e %(job_id)s
out tailer = qcat -f -o %(job_id)s
err viewer = qcat -e %(job_id)s
out viewer = qcat -o %(job_id)s
job name length maximum = 236
[[[meta]]]
description = HPC PBS job
Yes, I suspected the [[[remote]]] would be cylc-8 incompatible.
In the cylc-8 suite I am running, the EXTRACT_RESOURCE has no platform = setting, and in the job.out I see
…
[INFO] export config_base=fcm:moci.xm_tr
[INFO] export config_rev=@postproc_2.4
[INFO] export extract=extract
[INFO] export install=build
[INFO] export install_host=remote.cfg
[INFO] export model_config=um-nemocice.cfg
[INFO] export nemo_tools=fcm:nemo.xm/utils/tools_r4.0-HEAD@16016
[INFO] export pp_rev=postproc_2.4
[INFO] export pp_sources=branches/dev/ericaneininger/postproc_2.4_restrict_atmos_process_methods
[INFO] export verify_config=verify.cfg
[INFO] source: $HOME/cylc-run/suite-id/app/fcm_make_pp/file/fcm-make.cfg
[INFO] install: fcm-make.cfg
[INFO] source: $HOME/cylc-run/suite-id/app/fcm_make_pp/file/fcm-make.cfg
[INFO] export ROSE_TASK_MIRROR_TARGET=(hostname):cylc-run/suite-id/share/fcm_make_pp
[INFO] export MIRROR_TARGET=(hostname):cylc-run/suite-id/share/fcm_make_pp
[init] make # 2025-01-06T14:57:22
…
Is it possible to share the suite ID for your working cylc8 run?
Is ROSE_TASK_MIRROR_TARGET specified anywhere in a .cylc file? I can’t find any reference to it within my suites.
Hi Rob,
The suite is u-dm037, but as mentioned this is not run on Monsoon- I was only trying to relate the site/x settings with what appears in the job.out. It is also a different (UKESM) configuration that has more complicated pp tasks.
The ROSE_TASK_MIRROR_TARGET appears to be something added by Cylc in the background, hence trying to find out what settings are needed for that to appear.
Hi Mohit,
Is it perhaps configured for the platforms?
Is there any mention of it when cylc config
is run?
Cheers,
Rob
[[fcm_make_pp]]
inherit = RUN_MAIN, EXTRACT_RESOURCE
[[[environment]]]
ROSE_TASK_MIRROR_TARGET = xcs-c:cylc-run/u-dm191/share/fcm_make_pp
Hard coding the env variable did make fcm_make_pp run but it now fails at fcm_make2_pp (so frustrating!)
Anyone have any ideas on:
job.error:
[FAIL] no configuration specified or found
[FAIL] fcm make -C /home/d00/rwaters/cylc-run/u-dm191/run1/share/fcm_make_pp -n 2 -j 1 # return-code=2
2025-01-08T11:39:57Z CRITICAL - failed/ERR
job.out:
Workflow : u-dm191/run1
Job : 19880901T0000Z/fcm_make2_pp/01 (try 1)
User@Host: rwaters@shared100
2025-01-08T11:39:54Z INFO - started
[INFO] Configuration: /home/d00/rwaters/cylc-run/u-dm191/run1/app/fcm_make_pp/
[INFO] file: rose-app.conf
[INFO] optional key: (monsoon)
[INFO] export PATH=/opt/cray/netcdf/4.3.2/bin:/home/d00/rwaters/cylc-run/u-dm191/run1/share/bin:/home/d00/rwaters/cylc-run/u-dm191/run1/bin:/opt/ukmo/subversion/1.8.19/bin:/opt/python/gnu/2.7.9/bin/:/opt/ukmo/mass/moose-monsoon-client-latest/bin:/opt/cray/mpt/7.0.4/gni/bin:/opt/cray/atp/1.7.5/bin:/opt/cray/rca/1.0.0-2.0502.60530.1.62.ari/bin:/opt/cray/pmi/5.0.5-1.0000.10300.134.8.ari/bin:/opt/cray/craype/2.2.1/bin:/opt/cray/cce/8.3.4/cray-binutils/x86_64-unknown-linux-gnu/bin:/opt/cray/cce/8.3.4/craylibs/x86-64/bin:/opt/cray/cce/8.3.4/cftn/bin:/opt/cray/cce/8.3.4/CC/bin:/opt/cray/llm/default/bin:/opt/cray/llm/default/etc:/opt/cray/xpmem/0.1-2.0502.64982.7.29.ari/bin:/opt/cray/ugni/6.0-1.0502.10863.8.29.ari/bin:/opt/cray/udreg/2.3.2-1.0502.10518.2.17.ari/bin:/opt/cray/lustre-cray_ari_s/2.5_3.0.101_0.46.1_1.0502.8871.45.1-1.0502.21728.75.4/sbin:/opt/cray/lustre-cray_ari_s/2.5_3.0.101_0.46.1_1.0502.8871.45.1-1.0502.21728.75.4/bin:/opt/cray/alps/5.2.5-2.0502.9955.44.1.ari/sbin:/opt/cray/alps/5.2.5-2.0502.9955.44.1.ari/bin:/opt/cray/sdb/1.1-1.0502.63652.4.25.ari/bin:/opt/cray/nodestat/2.2-1.0502.60539.1.31.ari/bin:/opt/modules/3.2.10.3/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/pbs/bin:/usr/lib/qt3/bin:/opt/ukmo/supported/bin/:/opt/cray/bin:/opt/ukmo/supported/bin/:/opt/ukmo/supported/bin/:/home/d04/fcm/bin:/opt/ukmo/supported/bin/:/opt/ukmo/supported/bin/:/home/d04/fcm/bin:/opt/ukmo/supported/bin/:/opt/ukmo/supported/bin/:/home/d04/fcm/bin
[INFO] export config_base=fcm:moci.xm_tr
[INFO] export config_rev=@postproc_2.4
[INFO] export extract=extract
[INFO] export install=build
[INFO] export install_host=remote.cfg
[INFO] export model_config=um-atmos.cfg
[INFO] export pp_rev=postproc_2.4
[INFO] export pp_sources=branches/dev/ericaneininger/postproc_2.4_restrict_atmos_process_methods
[INFO] source: /home/d00/rwaters/cylc-run/u-dm191/run1/app/fcm_make_pp/file/fcm-make.cfg
[INFO] install: fcm-make.cfg
[INFO] source: /home/d00/rwaters/cylc-run/u-dm191/run1/app/fcm_make_pp/file/fcm-make.cfg
[init] make 2 # 2025-01-08T11:39:57Z
[info] FCM 2021.05.0 (/common/fcm/fcm-2021.05.0)
[init] make 2 config-parse # 2025-01-08T11:39:57Z
[FAIL] make 2 config-parse # 0.0s
[FAIL] make 2 # 0.0s
============================= PBS epilogue =============================
file %r not found...
End of Job Report
Run at 2025-01-08 11:39:58 for job 4455987.xcs00
Submitted : 2025-01-08 11:39:40
Queued : 2025-01-08 11:39:40
Started : 2025-01-08 11:39:52
Completed : 2025-01-08 11:39:58
Queued Time : 0:00:12 (12 seconds)
Elapsed Time : 0:00:06 (6 seconds, 2% of limit)
Walltime Limit : 0:05:00 (300 seconds)
Node Time Limit : 0:00:09 (9 seconds)
Node Time : 0:00:00 (0 seconds, 2% of limit)
Job Name : fcm_make2_pp.19880901T0000Z.u-dm191-run1
Queue : shared
Owner : rwaters
Group : mo_users
Project : ukca-cam
Subproject :
Funding :
Trustzone : collaboration
STDOUT : /home/d00/rwaters/cylc-run/u-dm191/run1/log/job/19880901T0000Z/fcm_make2_pp/01/job.out
STDERR : /home/d00/rwaters/cylc-run/u-dm191/run1/log/job/19880901T0000Z/fcm_make2_pp/01/job.err
Job Directory : /scratch/jtmp/pbs.4455987.xcs00.x8z
Job Arch :
CPU Core Type : broadwell
Total Nodes : 1
Total Tasks : 1
Parent Node : shared100
Parent Node Memory :
Parent Node CPU Time :
Compute Nodes :
Electrical Groups :
Run Version : 1
CPU Wallclock Wallclock RSS Memory Memory Memory
Node ID Count Used Requested Used Used Requested CPU Time
========== ========== ========== ========== ========== ========== ========== ==========
633 2 0 0% 0.0 49.4M 2.3% 0.0
========== ========== ========== ========== ========== ========== ========== ==========
For more information see documentation
flow.cylc section:
[[fcm_make2_pp]]
inherit = RUN_MAIN, PPBUILD_RESOURCE