Adding fcm_make_pp/postproc step to Nesting Suite

FranMorris · 18 December 2024 16:40

Hello

I’ve previously had a custom app set up for archiving to JASMIN that used the gridftp server, because I was using a version of the nesting suite that didn’t have an fcm_make_pp or postproc app (see e.g. the ncas_archive app in suite u-di808 - which was ages and ages ago copied from suite u-cy045 I believe which belongs to Helen Burns). I also have a version of the nesting suite adapted from the GAL9 branch of the nesting suite which I sort of…tried to figure out how to port to ARCHER2 with a bit of trial and error, but it now seems to run at least (u-dk846).

With the closure of the gridftp server, however, I don’t think I can adapt my existing app to work in the same way with the new globus-cli system. So I thought I would try to copy in a version of the fcm_make_pp and postproc apps and add them to my suite graphs in the place of the ncas_archive app… but it hasn’t seemed to work very well. My attempts can be found in u-dl827. Currently, it’s failing to do the fcm_make_pp step because it seems to be getting errors about host selection on ARCHER2, which is confusing since I thought that would happen in the job submission step, which seems to be working. (That said, in the past few weeks I’ve had continuous problems with host selection failing and timing out at job submission, leading me to have to retrigger jobs manually, which is annoying. Not sure if that’s a me thing or ARCHER-wide…)

I know it’s a bit of a frankenstein of a suite, but do you have any tips for trying to add the fcm_make_pp and postproc steps to a version of the nesting suite? Or is there anything that I’ve missed in trying to make this work?

Thanks!

Fran

RosalynHatcher · 19 December 2024 11:23

Hi Fran,

Probably the easiest is to replace the gridftp command in the ncas_archive app with a simple script that issues the globus commands.

In the [[NCAS_ARCHIVE]] family add pre-script to load globus-cli module. I think that’s the correct place.

  [[NCAS_ARCHIVE]]
    inherit = HOST_HPC
    pre-script = """
                 module load globus-cli
                 module list
                 """
     ....

In ncas_archive/rose-app.conf set

default = transfer.sh
Create a file ncas_archive/bin/transfer.sh with execute permissions containing for example:

#!/bin/bash
  
SRC_COLL='3e90d018-0d05-461a-bbaf-aab605283d21'
DEST_COLL='a2f53b7f-1b4e-4dce-9b7c-349ae760fee0'
LABEL='FranTest'

echo "globus transfer --format unix --jmespath 'task_id' --recursive --fail-on-quota-errors --sync-level checksum --verify-checksum --label ${LABEL} ${SRC_COLL}:${DATA_DIRECTORY} ${DEST_COLL}:${ROOT_PATH}/${FINAL_DIRECTORY}"

id=$(globus transfer --format unix --jmespath 'task_id' --recursive --fail-on-quota-errors --sync-level checksum --verify-checksum --label ${LABEL} ${SRC_COLL}:${DATA_DIRECTORY} ${DEST_COLL}:${ROOT_PATH}/${FINAL_DIRECTORY})

echo "Waiting on 'globus transfer' task: $id"

globus task wait -H $id
if [ $? -eq 0 ]; then
  echo "$task_id completed successfully";
else
  echo "$task_id failed";
  exit 1
fi

# Can then do a further check on the status if you want with `status=$(globus task show --jq "status" --format=UNIX $id)`

Caveat: The above script doesn’t quite work, in that it doesn’t capture the transfer task id which is essential for the wait command. At the moment I can’t quite see what I’ve done wrong.

Hope that helps.

Regards,
Ros.

RosalynHatcher · 19 December 2024 11:38

Just edited to fix the script name to transfer.sh not .py.

RosalynHatcher · 19 December 2024 11:52

Ignore my caveat above - the script does work and capture the task id, I was doing something stupid with my test data directory.

FranMorris · 19 December 2024 18:36

Hi Ros

Thank you so much for this!

I have tried to make these changes and while it seems like it should work, I’m having some trouble with the globus cli login - the step now throws the following error…

MissingLoginError: Missing login for Globus Transfer.
Please run:

  globus login

Usage: globus task wait [OPTIONS] TASK_ID

Error: Missing argument 'TASK_ID'.
[FAIL] transfer.sh <<'__STDIN__'
[FAIL] 
[FAIL] '__STDIN__' # return-code=1
2024-12-19T15:25:55Z CRITICAL - failed/EXIT

(I assume the second set of errors are because globus hasn’t run the first command so $id doesn’t exist).
However, I have already run globus login on the login nodes (so when I run it again, it tells me I’m already logged in). So I tried to add it in the [[NCAS_ARCHIVE]] pre-script, but it just timed out after an hour because it requires interacting with the web interface. Then I tried to add it in the transfer.sh file which also didn’t work. (Both of those gave a Login timed out. Please try again.)

Do you know how I can make sure I’m logged in on the compute nodes?

Best,
Fran

PS I’m also conscious that it’s probably approaching Christmas annual leave times so thank you for helping me out so late in the day!

RosalynHatcher · 19 December 2024 20:05

Hi Fran,

I think the problem is that /work/n02/n02/franmorr/.globus is a directory rather than a symlink to your /home/n02/n02/franmorr/.globus directory.

Please move or remove /work/n02/n02/franmorr/.globus directory and then redo the symlink in step 6.

ARCHER2> cd /work/n02/n02/<archer2_username>
ARCHER2> ln -s ~/.globus .globus

That should fix it.
Cheers,
Ros.

FranMorris · 20 December 2024 09:04

That’s worked! Thanks so much Ros! Merry Christmas

Fran

RosalynHatcher · 20 December 2024 14:00

Glad that fixed it. Merry Christmas.

system · 21 December 2024 14:00

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Setting up Globus transfer to JASMIN for nudged UM13.7 UKESM1.1 ARCHER2 suite Rose/Cylc and FCM JASMIN , ARCHER2 , Globus	2	38	2 December 2024
Setting up Globus for PPTransfer in non-standard postproc branch Rose/Cylc and FCM JASMIN , ARCHER2	7	58	8 January 2025
Gridftp no more? Unified Model JASMIN	6	22	22 February 2025
Change for Retirement of JASMIN Gridftp server - v12.0 fcm_make_pp question Rose/Cylc and FCM ARCHER2	3	26	17 December 2024
Setting up archer2-jasmin archiving in the nesting suite Unified Model JASMIN , ARCHER2 , Nesting-Suite	32	389	4 February 2024

Adding fcm_make_pp/postproc step to Nesting Suite

Related topics