Vn13.3 AMIP suite on Archer2

Hi James,

You need to add @5092 to the postproc_2.4_archer2_jasmin_rewrite branch in pp_sources too.

Regards,
Ros.

Thanks, Ros, I’ve edited pp sources to read:

pp_sources=fcm:moci.xm-br/dev/rosalynhatcher/postproc_2.4_archer2_jasmin_rewrite@5092

After I retriggered and fcm_make_pp and fcm_make2_pp, (without rerunning atmos_main or postproc), pptransfer failed but with a different error message which seems to pertain to the archive directory.

[WARN] file:atmospp.nl: skip missing optional source: namelist:moose_arch
[WARN] file:atmospp.nl: skip missing optional source: namelist:script_arch
[WARN] [SUBPROCESS]: Command: ls -A /work/n02/n02/jweber/cylc-run/u-dk655/share/cycle/19890701T0000Z/u-dk655/19890701T0000Z
[SUBPROCESS]: Error = 2:
ls: cannot access ‘/work/n02/n02/jweber/cylc-run/u-dk655/share/cycle/19890701T0000Z/u-dk655/19890701T0000Z’: No such file or directory

[FAIL] Error checking files to transfer
[FAIL] Terminating PostProc…
[FAIL] transfer.py <<‘STDIN
[FAIL]
[FAIL] ‘STDIN’ # return-code=1
2024-11-21T17:46:49Z CRITICAL - failed/EXIT

Does rev 5092 require a different setup for the archiving on archer2 before it is sent to Jasmin?

Cheers,

James

Hi James,

It’s because the model has run ahead using the new pp and put the data in a different location to what the old pptransfer is expecting. There isn’t an easy way to resolve that using the old pptransfer without rerunning.

I think the best thing is for you to move to the new pptransfer using Globus now, which should then cause pptransfer to pick the data up from the correct location. JASMIN have told us in the last couple of days that everyone will need to switch before Christmas anyway, so it makes sense to do it now rather than hack around with your suite and then have to do it in a few weeks anyway. Please hold your suite - make sure it doesn’t run anymore model/postproc tasks.

There is a bit of setup to do before you can use Globus; registering for the service, authenticating to the ARCHER2 & JASMIN endpoints and some setup on ARCHER2. Please follow the instructions here: Configuring PPTransfer using Globus

Once you’ve done that. In your suite

  • remove the @5092 from both config_rev & pp_sources branch.

  • In app/postproc/rose-app.conf add the following to the [namelist:pptransfer] section:

    globus_cli=true
    globus_default_colls=true
    globus_notify='off'

  • Reload the suite

  • Then re-run fcm_make_pp & fcm_make2_pp

Hopefully that should work. :crossed_fingers:
Regards,
Ros

Thanks, Ros. I’ve made the changes to u-dk655 and followed the steps in the linked webpage.

The globes whomami and globus whoami --linked-identities commands return the expected response.

jweber@ln04:/work/n02/n02/jweber> globus whoami --linked-identities

For information on which identities are in session see

globus session show

0000-0003-0643-2026@orcid.org

jweber@safe.archer2.ac.uk

jmw240@accounts.jasmin.ac.uk

I’ve also followed the

However, pptransfer is still failing with

[WARN] file:atmospp.nl: skip missing optional source: namelist:moose_arch
[WARN] file:atmospp.nl: skip missing optional source: namelist:script_arch
[WARN] [SUBPROCESS]: Command: globus transfer --format unix --jmespath task_id --recursive --fail-on-quota-errors --sync-level checksum --label u-dk655/19890701T0000Z --verify-checksum --notify off 3e90d018-0d05-461a-bbaf-aab605283d21:/work/n02/n02/jweber/cylc-run/u-dk655/share/cycle/19890701T0000Z a2f53b7f-1b4e-4dce-9b7c-349ae760fee0:/gws/nopw/j04/sheffield/jweber/archive/u-dk655/19890701T0000Z
[SUBPROCESS]: Error = 4:
MissingLoginError: Missing login for Globus Transfer.
Please run:

globus login

[WARN] Transfer command failed: globus transfer --format unix --jmespath ‘task_id’ --recursive --fail-on-quota-errors --sync-level checksum --label u-dk655/19890701T0000Z --verify-checksum --notify off 3e90d018-0d05-461a-bbaf-aab605283d21:/work/n02/n02/jweber/cylc-run/u-dk655/share/cycle/19890701T0000Z a2f53b7f-1b4e-4dce-9b7c-349ae760fee0:/gws/nopw/j04/sheffield/jweber/archive/u-dk655/19890701T0000Z
[ERROR] transfer.py: Globus Error: Failed authentication or authorization (Globus ReturnCode=4)
[FAIL] Command Terminated
[FAIL] Terminating PostProc…
[FAIL] transfer.py <<‘STDIN
[FAIL]
[FAIL] ‘STDIN’ # return-code=1
2024-11-22T15:44:12Z CRITICAL - failed/EXIT

And thoughts?

James

James

Did you do this bit:

  1. Authenticate to Globus using CLI

Grenville

Hi Grenville,

I think so. I just tried the below on Archer2

jweber@ln01:/work/n02/n02/jweber> module load globus-cli
jweber@ln01:/work/n02/n02/jweber> globus login
You are already logged in!

You may force a new login with
globus login --force

You can check your primary identity with
globus whoami

For information on which of your identities are in session use
globus session show

Logout of the Globus CLI with
globus logout

Hi James,

Does the step 5 work Globus CLI check:

ARCHER2> globus ls 3e90d018-0d05-461a-bbaf-aab605283d21:/~/
and
ARCHER2> globus ls a2f53b7f-1b4e-4dce-9b7c-349ae760fee0:/~/

??

I also note that something has gone wrong with your symlink in step 6. It is not a symlink to your ~/.globus directory??

Cheers,
Ros

Hi Ros,

When I ran: globus ls 3e90d018-0d05-461a-bbaf-aab605283d21:/~/

It listed my Archer2 home directory contents.

When I run globus ls a2f53b7f-1b4e-4dce-9b7c-349ae760fee0:/~/

It listed the contents of my jasmin home directory.

I take your point re the symlink. Is there a way to clear it and remake it?

Cheers,

James

Hi James,

Just move the directory /work/n02/n02/jweber/.globus out of the way and re do step 6.

Cheers,
Ros.

Thanks, Ros. I’ve moved the work/n02/n02/jweber/.globus to /home/n02/n02/jweber/globus_old and reran step 6.

Could you check the symlink now?

James

Hi James,

ARCHER2-23cab> ls -ld .globus
lrwxrwxrwx 1 jweber n02 28 Nov 25 10:27 .globus → /home/n02/n02/jweber/.globus/
:+1:
Cheers,
Ros

Brilliant, thanks Ros. I did a 1 m test and it looks to have done the transfer successfully. Will set off a longer run and let you know if any problems.

Cheers,

James

Hi Ros,

The model is running well and doing the pp transfer. The only issue is that postproc is taking well over 3 hours and sometimes fails on the wall clock (now set to 3H55M). I’ve cut down the stash output considerably and increased the postproc memory. Is there anything else I could do?

Cheers,

James

Hi James,

I wonder if the cause of the slowness is because a lot of the pp streams are being reinitialised every day so you are creating lots and lots of little files for postproc to deal with. The Lustre filesystem is also not great with lots of small files. I’d suggest changing the re-initialisation period to be longer and see if that helps.

Regards,
Ros.

Thanks, Ros, that appears to have worked. I had assumed those streams with daily reinitialisation weren’t the problem because I didn’t think I was outputting stuff to them but, having removed those streams entirely, postproc is much faster!

James