I’ve previously had a custom app set up for archiving to JASMIN that used the gridftp server, because I was using a version of the nesting suite that didn’t have an fcm_make_pp or postproc app (see e.g. the ncas_archive app in suite u-di808 - which was ages and ages ago copied from suite u-cy045 I believe which belongs to Helen Burns). I also have a version of the nesting suite adapted from the GAL9 branch of the nesting suite which I sort of…tried to figure out how to port to ARCHER2 with a bit of trial and error, but it now seems to run at least (u-dk846).
With the closure of the gridftp server, however, I don’t think I can adapt my existing app to work in the same way with the new globus-cli system. So I thought I would try to copy in a version of the fcm_make_pp and postproc apps and add them to my suite graphs in the place of the ncas_archive app… but it hasn’t seemed to work very well. My attempts can be found in u-dl827. Currently, it’s failing to do the fcm_make_pp step because it seems to be getting errors about host selection on ARCHER2, which is confusing since I thought that would happen in the job submission step, which seems to be working. (That said, in the past few weeks I’ve had continuous problems with host selection failing and timing out at job submission, leading me to have to retrigger jobs manually, which is annoying. Not sure if that’s a me thing or ARCHER-wide…)
I know it’s a bit of a frankenstein of a suite, but do you have any tips for trying to add the fcm_make_pp and postproc steps to a version of the nesting suite? Or is there anything that I’ve missed in trying to make this work?
Create a file ncas_archive/bin/transfer.sh with execute permissions containing for example:
#!/bin/bash
SRC_COLL='3e90d018-0d05-461a-bbaf-aab605283d21'
DEST_COLL='a2f53b7f-1b4e-4dce-9b7c-349ae760fee0'
LABEL='FranTest'
echo "globus transfer --format unix --jmespath 'task_id' --recursive --fail-on-quota-errors --sync-level checksum --verify-checksum --label ${LABEL} ${SRC_COLL}:${DATA_DIRECTORY} ${DEST_COLL}:${ROOT_PATH}/${FINAL_DIRECTORY}"
id=$(globus transfer --format unix --jmespath 'task_id' --recursive --fail-on-quota-errors --sync-level checksum --verify-checksum --label ${LABEL} ${SRC_COLL}:${DATA_DIRECTORY} ${DEST_COLL}:${ROOT_PATH}/${FINAL_DIRECTORY})
echo "Waiting on 'globus transfer' task: $id"
globus task wait -H $id
if [ $? -eq 0 ]; then
echo "$task_id completed successfully";
else
echo "$task_id failed";
exit 1
fi
# Can then do a further check on the status if you want with `status=$(globus task show --jq "status" --format=UNIX $id)`
Caveat: The above script doesn’t quite work, in that it doesn’t capture the transfer task id which is essential for the wait command. At the moment I can’t quite see what I’ve done wrong.
I have tried to make these changes and while it seems like it should work, I’m having some trouble with the globus cli login - the step now throws the following error…
(I assume the second set of errors are because globus hasn’t run the first command so $id doesn’t exist).
However, I have already run globus login on the login nodes (so when I run it again, it tells me I’m already logged in). So I tried to add it in the [[NCAS_ARCHIVE]]pre-script, but it just timed out after an hour because it requires interacting with the web interface. Then I tried to add it in the transfer.sh file which also didn’t work. (Both of those gave a Login timed out. Please try again.)
Do you know how I can make sure I’m logged in on the compute nodes?
Best,
Fran
PS I’m also conscious that it’s probably approaching Christmas annual leave times so thank you for helping me out so late in the day!