Hi Ros,
Thanks so much. I’ve tried everything you suggested (except for switching the transfers to a different login node, as at the moment I have pptransfer tasks stuck on both login3 and login4). Killing the tasks in the cylc GUI does stop them on login3.archer2.ac.uk. If I resubmit any pptransfer tasks, they still just get stuck.
I think the problem is the rsync command:
When I run the following -
rsync -av --stats --rsync-path=“mkdir -p /gws/nopw/j04/pmip4_vol1/users/rachel/ARCHER2_archive/u-ck765/18551001T0000Z && rsync” /work/n02/n02/radiam24/archive/u-ck765/18551001T0000Z/ hpxfer2.jasmin.ac.uk:/gws/nopw/j04/pmip4_vol1/users/rachel/ARCHER2_archive/u-ck765/18551001T0000Z
On JASMIN, directory 18551001T0000Z is created containing the checksum files. I get the following output on ARCHER2:
sending incremental file list
checksums
cice_ck765i_1d_18551001-18551101.nc
However, after this, no further filenames are printed on ARCHER2 and no more files are transferred to JASMIN (I waited for 20 mins, and I think the whole pptransfer task usually takes around 10 mins).
On login3 on ARCHER2, when I run radiam24 | grep rsync after waiting on rsync for 20 mins, I get output:
radiam24@ln03:~> ps -flu radiam24 |grep rsync
0 S radiam24 35401 129415 0 80 0 - 11190 poll_s 11:34 pts/17 00:00:00 rsync -av --stats --rsync-path=mkdir -p /gws/nopw/j04/pmip4_vol1/users/rachel/ARCHER2_archive/u-ck765/18551001T0000Z && rsync /work/n02/n02/radiam24/archive/u-ck765/18551001T0000Z/ hpxfer2.jasmin.ac.uk:/gws/nopw/j04/pmip4_vol1/users/rachel/ARCHER2_archive/u-ck765/18551001T0000Z
0 S radiam24 35402 35401 0 80 0 - 13462 poll_s 11:34 pts/17 00:00:00 ssh hpxfer2.jasmin.ac.uk mkdir -p /gws/nopw/j04/pmip4_vol1/users/rachel/ARCHER2_archive/u-ck765/18551001T0000Z && rsync --server -vlogDtpre.iLsfxC --stats . /gws/nopw/j04/pmip4_vol1/users/rachel/ARCHER2_archive/u-ck765/18551001T0000Z
0 S radiam24 45047 151072 0 80 0 - 2177 pipe_w 11:35 pts/194 00:00:00 grep rsync
Do you have any more suggestions? The only thing I may have changed since pptransfer was working normally last week was, for a few suites, increasing the runtime and restarting the suite, and, for a few other suites, changing the transfer server from hpxfer1 to hpxfer2, so I’m not sure where this problem could have come from.
Best wishes,
Rachel