Hi James,
You need to add @5092
to the postproc_2.4_archer2_jasmin_rewrite
branch in pp_sources
too.
Regards,
Ros.
Hi James,
You need to add @5092
to the postproc_2.4_archer2_jasmin_rewrite
branch in pp_sources
too.
Regards,
Ros.
Thanks, Ros, I’ve edited pp sources to read:
pp_sources=fcm:moci.xm-br/dev/rosalynhatcher/postproc_2.4_archer2_jasmin_rewrite@5092
After I retriggered and fcm_make_pp and fcm_make2_pp, (without rerunning atmos_main or postproc), pptransfer failed but with a different error message which seems to pertain to the archive directory.
[WARN] file:atmospp.nl: skip missing optional source: namelist:moose_arch
[WARN] file:atmospp.nl: skip missing optional source: namelist:script_arch
[WARN] [SUBPROCESS]: Command: ls -A /work/n02/n02/jweber/cylc-run/u-dk655/share/cycle/19890701T0000Z/u-dk655/19890701T0000Z
[SUBPROCESS]: Error = 2:
ls: cannot access ‘/work/n02/n02/jweber/cylc-run/u-dk655/share/cycle/19890701T0000Z/u-dk655/19890701T0000Z’: No such file or directory
[FAIL] Error checking files to transfer
[FAIL] Terminating PostProc…
[FAIL] transfer.py <<‘STDIN’
[FAIL]
[FAIL] ‘STDIN’ # return-code=1
2024-11-21T17:46:49Z CRITICAL - failed/EXIT
Does rev 5092 require a different setup for the archiving on archer2 before it is sent to Jasmin?
Cheers,
James
Hi James,
It’s because the model has run ahead using the new pp and put the data in a different location to what the old pptransfer is expecting. There isn’t an easy way to resolve that using the old pptransfer without rerunning.
I think the best thing is for you to move to the new pptransfer using Globus now, which should then cause pptransfer to pick the data up from the correct location. JASMIN have told us in the last couple of days that everyone will need to switch before Christmas anyway, so it makes sense to do it now rather than hack around with your suite and then have to do it in a few weeks anyway. Please hold your suite - make sure it doesn’t run anymore model/postproc tasks.
There is a bit of setup to do before you can use Globus; registering for the service, authenticating to the ARCHER2 & JASMIN endpoints and some setup on ARCHER2. Please follow the instructions here: Configuring PPTransfer using Globus
Once you’ve done that. In your suite
remove the @5092 from both config_rev
& pp_sources
branch.
In app/postproc/rose-app.conf
add the following to the [namelist:pptransfer]
section:
globus_cli=true
globus_default_colls=true
globus_notify='off'
Reload the suite
Then re-run fcm_make_pp
& fcm_make2_pp
Hopefully that should work.
Regards,
Ros
Thanks, Ros. I’ve made the changes to u-dk655 and followed the steps in the linked webpage.
The globes whomami and globus whoami --linked-identities commands return the expected response.
jweber@ln04:/work/n02/n02/jweber> globus whoami --linked-identities
For information on which identities are in session see
globus session show
I’ve also followed the
However, pptransfer is still failing with
[WARN] file:atmospp.nl: skip missing optional source: namelist:moose_arch
[WARN] file:atmospp.nl: skip missing optional source: namelist:script_arch
[WARN] [SUBPROCESS]: Command: globus transfer --format unix --jmespath task_id --recursive --fail-on-quota-errors --sync-level checksum --label u-dk655/19890701T0000Z --verify-checksum --notify off 3e90d018-0d05-461a-bbaf-aab605283d21:/work/n02/n02/jweber/cylc-run/u-dk655/share/cycle/19890701T0000Z a2f53b7f-1b4e-4dce-9b7c-349ae760fee0:/gws/nopw/j04/sheffield/jweber/archive/u-dk655/19890701T0000Z
[SUBPROCESS]: Error = 4:
MissingLoginError: Missing login for Globus Transfer.
Please run:
globus login
[WARN] Transfer command failed: globus transfer --format unix --jmespath ‘task_id’ --recursive --fail-on-quota-errors --sync-level checksum --label u-dk655/19890701T0000Z --verify-checksum --notify off 3e90d018-0d05-461a-bbaf-aab605283d21:/work/n02/n02/jweber/cylc-run/u-dk655/share/cycle/19890701T0000Z a2f53b7f-1b4e-4dce-9b7c-349ae760fee0:/gws/nopw/j04/sheffield/jweber/archive/u-dk655/19890701T0000Z
[ERROR] transfer.py: Globus Error: Failed authentication or authorization (Globus ReturnCode=4)
[FAIL] Command Terminated
[FAIL] Terminating PostProc…
[FAIL] transfer.py <<‘STDIN’
[FAIL]
[FAIL] ‘STDIN’ # return-code=1
2024-11-22T15:44:12Z CRITICAL - failed/EXIT
And thoughts?
James
James
Did you do this bit:
Grenville
Hi Grenville,
I think so. I just tried the below on Archer2
jweber@ln01:/work/n02/n02/jweber> module load globus-cli
jweber@ln01:/work/n02/n02/jweber> globus login
You are already logged in!
You may force a new login with
globus login --force
You can check your primary identity with
globus whoami
For information on which of your identities are in session use
globus session show
Logout of the Globus CLI with
globus logout
Hi James,
Does the step 5 work Globus CLI check:
ARCHER2> globus ls 3e90d018-0d05-461a-bbaf-aab605283d21:/~/
and
ARCHER2> globus ls a2f53b7f-1b4e-4dce-9b7c-349ae760fee0:/~/
??
I also note that something has gone wrong with your symlink in step 6. It is not a symlink to your ~/.globus
directory??
Cheers,
Ros
Hi Ros,
When I ran: globus ls 3e90d018-0d05-461a-bbaf-aab605283d21:/~/
It listed my Archer2 home directory contents.
When I run globus ls a2f53b7f-1b4e-4dce-9b7c-349ae760fee0:/~/
It listed the contents of my jasmin home directory.
I take your point re the symlink. Is there a way to clear it and remake it?
Cheers,
James
Hi James,
Just move the directory /work/n02/n02/jweber/.globus
out of the way and re do step 6.
Cheers,
Ros.
Thanks, Ros. I’ve moved the work/n02/n02/jweber/.globus to /home/n02/n02/jweber/globus_old and reran step 6.
Could you check the symlink now?
James
Hi James,
ARCHER2-23cab> ls -ld .globus
lrwxrwxrwx 1 jweber n02 28 Nov 25 10:27 .globus → /home/n02/n02/jweber/.globus/
Cheers,
Ros
Brilliant, thanks Ros. I did a 1 m test and it looks to have done the transfer successfully. Will set off a longer run and let you know if any problems.
Cheers,
James
Hi Ros,
The model is running well and doing the pp transfer. The only issue is that postproc is taking well over 3 hours and sometimes fails on the wall clock (now set to 3H55M). I’ve cut down the stash output considerably and increased the postproc memory. Is there anything else I could do?
Cheers,
James
Hi James,
I wonder if the cause of the slowness is because a lot of the pp streams are being reinitialised every day so you are creating lots and lots of little files for postproc to deal with. The Lustre filesystem is also not great with lots of small files. I’d suggest changing the re-initialisation period to be longer and see if that helps.
Regards,
Ros.
Thanks, Ros, that appears to have worked. I had assumed those streams with daily reinitialisation weren’t the problem because I didn’t think I was outputting stuff to them but, having removed those streams entirely, postproc is much faster!
James