Pptransfer fail from ARCHER to JASMIN

Hello,

I’m struggling to get my models to transfer properly from ARCHER to JASMIN. They keep archiving in work/archive rather than cylc-run/ after I made the changes from this ticket: Corrupted ARCHER-JASMIN workflow - #4 by RosalynHatcher. The simulations now fail the pptransfer every time. Sometimes it says “there is no job.err” so I can’t see why it’s having problems, but I did get this error from one of my suites (I’ve checked and they’re both getting this error now):

%------------------------------------------------

Lmod is automatically replacing “cce/15.0.0” with “gcc/11.2.0”.

Due to MODULEPATH changes, the following have been reloaded:

  1. cray-mpich/8.1.23

Currently Loaded Modules:

  1. craype-x86-rome 12) load-epcc-module
  2. libfabric/1.12.1.2.2.0.0 13) gcc/11.2.0
  3. craype-network-ofi 14) netcdf4/1.6.4
  4. perftools-base/22.12.0 15) cray-hdf5/1.12.2.1
  5. xpmem/2.5.2-2.4_3.30__gd0f7936.shasta 16) cray-netcdf/4.9.0.1
  6. craype/2.7.19 17) nco/5.1.6
  7. cray-dsmml/0.2.2 18) gct/v6.2.20220524
  8. cray-libsci/22.12.1.1 19) cray-python/3.9.13.1
  9. PrgEnv-cray/8.3.3 20) globus-cli/3.32.0
  10. bolt/0.8 21) postproc/2024.11
  11. epcc-setup-env 22) cray-mpich/8.1.23

[WARN] file:atmospp.nl: skip missing optional source: namelist:moose_arch
[WARN] file:atmospp.nl: skip missing optional source: namelist:script_arch
Traceback (most recent call last):
File “/work/n02/n02/m_brown2/cylc-run/u-dk793/share/fcm_make_pptransfer/build/bin/transfer.py”, line 414, in
main()
File “/work/n02/n02/m_brown2/cylc-run/u-dk793/share/fcm_make_pptransfer/build/bin/transfer.py”, line 396, in main
transfer = Transfer()
File “/work/n02/n02/m_brown2/cylc-run/u-dk793/share/fcm_make_pptransfer/build/bin/transfer.py”, line 48, in init
self._globus_cli = nl_transfer.globus_cli
AttributeError: ‘ReadNamelist’ object has no attribute ‘globus_cli’
[FAIL] transfer.py <<‘STDIN
[FAIL]
[FAIL] ‘STDIN’ # return-code=1
2024-12-06T16:25:46Z CRITICAL - failed/EXIT

%------------------------------------------------

My suites are u-dk764 and u-de764, and both are having the same issue. They’ve been running fine up until the last few days, so I’m confused as to why they’re now struggling?

Many thanks for your time.

Kind regards,
Megan

Hi Megan,

They are failing with the globus_cli message because your suites unfortunately don’t specify a revision number for the postproc_2.3_pptransfer_gridftp_nopw fcm_make_pp branch which I updated earlier this week in preparation for the move to using globus so it has automatically picked up the code changes, but the suite doesn’t have the extra variables in so is failing.

I will be making an announcement on Monday to ask everyone currently using gridftp to transfer files from ARCHER2 to JASMIN to move to using globus ahead of JASMIN retiring the gridftp1/slcs1 servers.

Please see instructions here: https://cms.ncas.ac.uk/unified-model/pptransfer-globus/ which include how to add in the globus_cli variable plus a couple of others.

To get your suite to archive to the cylc-run//share/cycle directory you need to set in
postproc → post postprocessing → archer archiving the variable archive_root_path to $ROSE_DATAC .

u-de764 is archiving ok to cylc-run/u-de764/share/cycle directory. u-dk793 is still set to your old archive location rather than $ROSE_DATAC.

Regards,
Ros.

Hi Ros,

Thanks for this - it’s a relief to know it wasn’t me doing something strange!

I’ve followed through the globus instructions, but got held up when it required a change for the suite and recommended asking the CMS helpdesk (I put a ticket in here: https://cms.ncas.ac.uk/news/gridftp/).

Many thanks,
Megan

Hi Megan,

It’s fine please carry on. You’ve already picked the code changes up in that branch.

Cheers,
Ros.

Hi Ros,

I tried again with my suite u-de764, and I’m still getting the same error message for the pptransfer. Is there something I need to change in suite itself under pp_sources as they’re still using the branch fcm:moci.xm-br/dev/rosalynhatcher/postproc_2.3_pptransfer_gridftp_nopw?

I realised I didn’t link my previous ticket before - this is one I opened the other day: https://cms-helpdesk.ncas.ac.uk/t/archer2-setting-up-globus-transfer-for-um-suites/1577/3

Cheers,
Megan

Hi Megan,

You appear to have missed this step to add in the extra namelist variables:

  • In app/postproc/rose-app.conf add the following variables to the [namelist:pptransfer] section and set gridftp=false:

    globus_cli=true
    globus_default_colls=true
    globus_notify='off'
    

Then do a rose suite-run --reload and retrigger the failed pptransfer task.

Cheers,
Ros.

Hi Ros,

I’ve made this change on both suites and it everything appears to be working now!

Thank you very much,

Megan

Great. Thanks for letting us know.

Cheers,
Ros.

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.