Globus transfer failures

SimonTett · 10 August 2025 09:20

I’m getting a bunch of pptransfer failures. Looking at the globus errors they are of the same form – see below.

Is this an archer2 or jasmine failure? Asking so I can work out which help desk to complain to…

Simon

Error (transfer)

Endpoint: Archer2 file systems (3e90d018-0d05-461a-bbaf-aab605283d21)
Server: 193.62.216.42:443
File: /work/n02/n02/tetts/cylc-run/u-dr157/run7/share/cycle/19891001T0000Z/dr157a.pd1989nov.pp
Command: RETR /work/n02/n02/tetts/cylc-run/u-dr157/run7/share/cycle/19891001T0000Z/dr157a.pd1989nov.pp
Message: Data channel authentication failed

Details: 500-Command failed. : globus_xio: The GSI XIO driver failed to establish a secure connection. The failure occured during a handshake read.\r\n500-globus_xio: Operation was canceled\r\n500-globus_xio: Operation timed out\r\n500 End.\r\n

RosalynHatcher · 12 August 2025 09:07

Hi Simon,

If you are still getting this error please contact the ARCHER2 helpdesk and send them the error above which tells them which server is having problems.

Regards,
Ros.

SimonTett · 12 August 2025 10:00

Hi Ros,

I had a look at the files that had made it to jasmin with xconv. This suggested they had got corrupted. So, after checking that the data was still on archer2 I removed the data from jasmin. Retriggered the pptransfer and this time it went through. As I don’t really understand why it worked I don’t think this is a solution… Unless I am the only person having globus transfer problems…

Simon

SimonTett · 13 August 2025 08:47

I got below from the jasmin help desk (with edits to remove people’s names). Is a solution to change pptransfer to give a short deadline to globus – an hour. And pptransfer then retries (if failed) after an hour or so?? That way it might get another node….

Simon

Hi
This report outlines the current performance issues you might be experiencing with Globus transfers and explains the underlying causes.

Root Cause of Performance Problems

The performance challenges you’re observing with Globus transfers aren’t due to Globus itself. Instead, they stem from intermittent issues with the ability of our Globus transfer nodes to read and write to the QuoByte filesystem.

We operate a pool of five transfer nodes, which are automatically assigned to your transfers by a load balancer. If your transfer happens to be picked up by a node currently experiencing QuoByte problems, it’s likely that the transfer will undergo numerous retries before it eventually succeeds.

We are actively working to identify these problematic nodes as they occur and temporarily remove them from the pool for rebooting. However, this process is currently manual.

Data Integrity and Transfer Retries

To ensure data integrity, a Globus transfer typically involves a checksum validation unless specifically disabled by you or your workflow. If this integrity check fails, the transfer automatically retries. This mechanism accounts for both the slow overall performance and the intermittent nature of the issue: if your transfer lands on a healthy node, it will proceed quickly as expected.

Similarly, unless a transfer is explicitly canceled by a user, it is designed to continue retrying until it succeeds (up to a very high limit that is rarely encountered). Many of the “errors” you might see are actually just informative messages indicating that the task is being retried, rather than a definitive failure. While this ensures eventual completion, it can manifest as a slow overall transfer speed.

If you require a transfer to “bail out” earlier than this automatic retry limit, you can set an earlier deadline using the transfer command.

Please let us know if you have any further questions or require additional assistance

–

RosalynHatcher · 14 August 2025 08:57

Hi Simon,

Yes, Matt has been in touch with me and that is one workaround we have been looking at.

Regards,
Ros.

SimonTett · 17 August 2025 13:19

Ros,

over the last few days all transfers have been going through… So, maybe nothing needs to be done!

Simon

Topic		Replies	Views
Pptransfer/globus Unified Model JASMIN , ARCHER2	5	68	15 November 2025
another endpoint error ARCHER2	5	59	26 August 2025
Setting up Globus for PPTransfer in non-standard postproc branch Rose/Cylc and FCM JASMIN , ARCHER2	7	146	8 January 2025
Globus transfers failing Unified Model ARCHER2	3	165	16 January 2025
Trouble with pptransfer Unified Model JASMIN , ARCHER2	12	53	13 February 2026

Globus transfer failures

Error (transfer)

Root Cause of Performance Problems

Data Integrity and Transfer Retries

Related topics