Issues with Globus Transfer to JASMIN

Hi CMS,

I’m encountering some issues with pptransfer to JASMIN via Globus. There are two issues really.

Firstly, the transfer jobs aren’t completing on the first submission (job(01)). If I log in to Globus Online, the error that keeps cropping up is: endpoint too busy. The below is copied from the Globus event log.

Error (transfer)
Endpoint: JASMIN Default Collection (a2f53b7f-1b4e-4dce-9b7c-349ae760fee0)
Server: 130.246.1.6:443
Command: PASV
Message: The server may be too busy
Explanation: The endpoint has reached its maximum number of connections available to transfer data.  This could be a transient problem which can be ignored.
---
Details: 500-Command failed.\r\r\n500- : globus_ftp_control_local_pasv failed.\r\n500-globus_xio: globus_l_xio_tcp_bind failed.\r\n500-globus_xio: System error in bind: Address already in use\r\n500-globus_xio: A system call failed: Address already in use\r\n500 End.\r\n

Is this an error that you are aware of or that other users are experiencing?

Secondly, when the first job times out on ARCHER due to the above, subsequent re-tries also fail but for a different reason: A transfer with identical paths has not yet completed. The following is from pptransfer/NN/job.err:

[WARN] file:atmospp.nl: skip missing optional source: namelist:moose_arch
[WARN] file:atmospp.nl: skip missing optional source: namelist:script_arch
[WARN]  [SUBPROCESS]: Command: globus transfer --format unix --jmespath task_id --recursive --fail-on-quota-errors --sync-level checksum --label u-dp788/18740101T0000Z --verify-checksum --notify off 3e90d018-0d05-461a-bbaf-aab605283d21:/work/n02/n02/ajw1g19/archive/u-dp788/18740101T0000Z a2f53b7f-1b4e-4dce-9b7c-349ae760fee0:/gws/nopw/j04/mh_gsp/Model_Output/u-dp788/18740101T0000Z
[SUBPROCESS]: Error = 1:
	Globus CLI Error: A Transfer API Error Occurred.
HTTP status:      409
request_id:       UjMZgjDbH
code:             Conflict
message:          A transfer with identical paths has not yet completed

[WARN]  Transfer command failed: globus transfer --format unix --jmespath 'task_id' --recursive --fail-on-quota-errors --sync-level checksum --label u-dp788/18740101T0000Z --verify-checksum --notify off 3e90d018-0d05-461a-bbaf-aab605283d21:/work/n02/n02/ajw1g19/archive/u-dp788/18740101T0000Z a2f53b7f-1b4e-4dce-9b7c-349ae760fee0:/gws/nopw/j04/mh_gsp/Model_Output/u-dp788/18740101T0000Z
[ERROR]  transfer.py: Globus Error: Network or server error occurred (Globus ReturnCode=1)
[FAIL]  Command Terminated
[FAIL] Terminating PostProc...
[FAIL] transfer.py <<'__STDIN__'
[FAIL] 
[FAIL] '__STDIN__' # return-code=1
2025-06-05T14:04:55Z CRITICAL - failed/EXIT

Meanwhile, on Globus, the original transfer operation still shows as active.

Can you please advise?

Regards,
Alfred

Hi Alfred,

I’ve passed the endpoint busy problem to JASMIN to look into. We’ve seen it before and might mean that one of the nodes is having issues and causing all the work to go through the remaining nodes.

With regard to your second question; When a globus transfer request is submitted and errors, the request stays active in globus for 2 days and is repetitively retried until either it goes through or expires. If you try to resubmit the same request whilst there is one already active you get the “A transfer with identical paths has not yet completed” error. You’ll probably find the transfer will succeed overnight. Once it has gone through could you please double check that there are no zero length files on JASMIN and then set the task to succeeded in the cylc UI.

Cheers,
Ros.

Hi Ros,

Thank you for passing things on to JASMIN, I’ll keep an eye on the transfers throughout the next day.

Regards,
Alfred

Hi Ros,

Thought I’d add to this thread as I’m getting Alfred’s second error (and I may have the first one too but I’m nit sure where to find the globus log).

Things were running fine a couple of days ago. For one of the suites I can see that pptransfer has been tried numerous times overnight from the log.

As an aside, I have several active suites which should be transferring data to the same directory on Jasmin (/gws/nopw/j04/inhale/jweber) but into separate sub directories. Could multiple suites trying to archive to the same main directory cause this issue?

All things considered, does this sound more like a Jasmin issue?

Cheers,

James

Hi James,

You can access Globus logs by signing in to your account at https://www.globus.org/. The activity tab on the left should show all completed and active transfers from ARCHER2. If you’re experiencing the same issue I am, then some tasks should have an option to “view event log”, which is where you can see the error messages.

Regards,
Alfred

Thanks, Alfred. It looks like I am also getting an “endpoint too busy” message

Error (transfer)
Endpoint: JASMIN Default Collection (a2f53b7f-1b4e-4dce-9b7c-349ae760fee0)
Server: 130.246.1.7:443
Command: PASV
Message: The server may be too busy
Explanation: The endpoint has reached its maximum number of connections available to transfer data. This could be a transient problem which can be ignored.

Details: 500-Command failed.\r\r\n500- : globus_ftp_control_local_pasv failed.\r\n500-globus_xio: globus_l_xio_tcp_bind failed.\r\n500-globus_xio: System error in bind: Address already in use\r\n500-globus_xio: A system call failed: Address already in use\r\n500 End.\r\n

Hi James,

Please report this to the Jasmin helpdesk. It sounds like an issue at their end.

Annette

Hi Annette,

I’ve emailed the Jamsin helpdesk.

James

FYI Matt told me yesterday they are investigating the “server too busy” problem.

1 Like