Trouble with pptransfer

c.j.r.williams · 27 January 2026 15:00

Hi,

Sorry to bother you, but I am having trouble with my pptransfer. I get the following error message in my job.err:

[WARN] Transfer command failed: globus transfer --format unix --jmespath ‘task_id’ --recursive --fail-on-quota-errors --sync-level checksum --label u-df570/18500101T0000Z --verify-checksum --notify off 3e90d018-0d05-461a-bbaf-aab605283d21:/work/n02/n02/cjrw09/cylc-run/u-df570/share/cycle/18500101T0000Z/u-df570/18500101T0000Z a2f53b7f-1b4e-4dce-9b7c-349ae760fee0:/gws/nopw/j04/pmip4_vol2/users/cwilliams2011/archer2.d/output.d/pi.d/u-df570/18500101T0000Z
[ERROR] transfer.py: Transfer Error: Checksum validation failed (ReturnCode=4)
[FAIL] Command Terminated
[FAIL] Terminating PostProc…
[FAIL] transfer.py <<‘STDIN’
[FAIL]
[FAIL] ‘STDIN’ # return-code=1
2026-01-27T03:12:29Z CRITICAL - failed/EXIT

Just above this, it tells me to run: globus login, but I do this and it says command not found.

What have I done wrong? My suite is u-df570.

Thanks a lot,

Charlie

AnnetteOsprey · 27 January 2026 16:23

Hi Charlie,

It’s a JASMIN maintenance day today, so all services are at risk. I wonder if that is the source of your issue?

I have had pptransfer failures with my runs as well - although different error messages. I would recommend re-trying the transfer again tomorrow.

Annette

RosalynHatcher · 27 January 2026 17:11

Hi Charlie,

Once JASMIN is back from maintenance if you still get the same issue run the globus login command as instructed, but before you can run globus commands on the ARCHER2 command line you need to load the globus-cli module:

module load globus-cli/3.35.2

Regards,
Ros.

c.j.r.williams · 27 January 2026 17:27

Thanks both, I will try again tomorrow.

c.j.r.williams · 27 January 2026 17:38

Bizarrely, I checked a couple of hours ago and although pptransfer is still failing, or rather retrying, all of my output has correctly gone to JASMIN.

RosalynHatcher · 27 January 2026 18:38

Hi Charlie,

Globus is an asynchronous task and once submitted will by default keep trying for 24hours or so. PPTransfer has it’s own timeout usually 3 hours so if Globus is down for a while and the transfer takes 5hours. PPTransfer will show as failed but the actual globus task may have succeeded. I’d probably recommend not having retries on the pptransfer task and if it fails you manually check through the Globus web app to confirm if it has failed or not and then either set the task to succeeded or retrigger it. Most times Globus will complete within the 3 hour limit of the pptransfer so pptransfer can detect it’s successful completion, but obviously with JASMIN maintenance transfers can and will take longer.

Cheers,
Ros

c.j.r.williams · 28 January 2026 13:17

Hi Ros,

Sorry for the delay. Okay, having checked my run this morning, pptransfer has indeed failed. So I did what you suggested i.e.

module load globus-cli/3.35.2
globus login

and entered the authorisation code provided, and it now says I am logged in. So should I now retrigger pptransfer? Or do I not need to, given that all of my data transferred correctly yesterday, despite the JASMIN maintenance? I have just checked, and everything is there as expected.

I confess I am slightly confused here, as to what is actually doing the transfer? Is it Globus, or pptransfer? Or does pptransfer specify the location of where the data should be transferred to, whereas Globus does the actual transferring? If so, I don’t understand how Globus can make the transfer if pptransfer has failed? You say that Globus will keep trying for 24 hours whereas pptransfer times out after usually 3 hours or so. But if it is not necessary for pptransfer to have succeeded in order for Globus to do the transfer, why do we run pptransfer at all? Sorry, these are probably daft questions, but I would like to understand exactly what is going on here.

Either way, how often should I do the above Globus module commands? Just if pptransfer fails, or more regularly? And how do I turn off automatic retries with pptransfer? Maybe in ~/roses/u-df570/app/postproc/rose-app.conf ?

Charlie

RosalynHatcher · 28 January 2026 14:01

Hi Charlie,

pptransfer uses the globus-cli to submit the data transfer request to Globus.
Globus is an asynchronous service so the data transfer may not happen immediately.

pptransfer waits to hear back from the Globus service that the transfer has succeeded. As you know with Cylc each task has a timeout on it. For pptransfer that is usually 3 hours.

Globus will usually complete within this time, however it may not. If JASMIN is down for instance, Globus will be unable to complete in the time so the cylc pptransfer task will indicate failure BUT the Globus task will continue trying. You then simply need to check on the Globus task within the app.globus.com website and if successful you can then manually set the pptransfer task to succeeded.

The globus authentication needs renewing every month.

Looking at your suite the setup in this one is not easy to turn off the retries just for pptransfer.
You will need to edit the file site/archer2.rc and in the [[POSTPROC_RESOURCE]] section remove the line:

execution retry delays = PT10M, PT1H, PT3H, P1D

NOTE however this will also turn off the automatic retries for the postproc tasks as well.

Cheers,
Ros

c.j.r.williams · 29 January 2026 14:34

Thanks very much Ros, I now completely understand. I think I will leave the automatic retries on i.e. as is, just to avoid any further complications. It just means that a little bit of babysitting will be required when I start to run proper simulations (today, in fact), because presumably if (for whatever reason) JASMIN is down, pptransfer will fail (even if subsequently Globus transfers the data) which will hold up the next cycle. So I will need to manually reset it to “Succeeded” in order to move on to my next cycle, in my case year.

Charlie

SimonTett · 12 February 2026 15:25

I’d be careful doing that. As I think “housekeeping” will then run and remove your data before it gets transferred. Best thing is to let it run – the runahead limit is, be default, 3 so the suite will keep running the model for 3 cycles. Then when jasmin comes back up the pptransfer job will either run or you need to trigger it. Then all the held pptransfers will run fairly quickly.

Simon

c.j.r.williams · 12 February 2026 16:00

Thanks Simon. But when I checked it a couple of days ago, it had already stopped ie rose sgc told me it wasn’t running. I never stopped it myself so assumed it had completely failed.

Charlie

RosalynHatcher · 13 February 2026 09:55

Hi Charlie,

You can only set the failed pptransfer task to succeeded once you have checked on the globus task in the globus web page to ensure it has succeeded. As Simon says if you simply set the failed pptransfer task to succeeded BEFORE globus has transferred the data you run the risk that housekeeping with delete the data before it is transferred to JASMIN.

In response to your last comment; a suite will shutdown completely (ie. not be visible with rose sgc when it either has completed successfully or has been inactive for a period, usually a couple of days (e.g. a task has failed and the suite can’t run on until you have fixed the problem)

You can restart the suite with rose suite-restart and then resolve any issues and retrigger the failed task(s).

Cheers,
Ros

c.j.r.williams · 13 February 2026 11:39

Ok, thank you. So just to be clear: I restart the suite with a rose suite -restart, then check the Globus app online to see if the last transfer did actually work. If it did, I can set the pptransfer task to succeeded and it should carry on. If it didn’t, I re-trigger the pptransfer so that it starts transferring again, before it continues. Is that right?

Topic		Replies	Views
PPtransfer Issue	2	22	24 December 2025
Pptransfer fail Unified Model JASMIN , ARCHER2	3	214	2 November 2023
Problems transferring files to JASMIN General ARCHER2	11	191	3 March 2024
Porting HadGEM3 suite to ARCHER2 (2) Unified Model	4	69	13 February 2025
Setting up Globus for PPTransfer in non-standard postproc branch Rose/Cylc and FCM JASMIN , ARCHER2	7	134	8 January 2025

Trouble with pptransfer

Related topics