-bash-4.1$ ssh -i ~/.ssh/id_rsa_archerum jweber@login4.archer2.ac.uk
The authenticity of host ‘login4.archer2.ac.uk (193.62.216.45)’ can’t be established.
RSA key fingerprint is 1c:0f:77:c8:b0:b0:c9:8d:4a:90:cf:31:e2:a6:76:ae.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘login4.archer2.ac.uk’ (RSA) to the list of known hosts.
Enter passphrase for key ‘/home/jmw240/.ssh/id_rsa_archerum’:
PTY allocation request failed on channel 0
Comand rejected by policy. Not in authorised list
Connection to login4.archer2.ac.uk closed.
I think this as it should be?
However, the error message above remains when I try to run the suite. Is there something else I should be doing?
You need to make sure the ssh-key is attached to your ssh-agent. It should not prompt for your passphrase when you ssh on the command line. Nor should you have to supply the option -i ~/.ssh/id_rsa_archerum.
Please try running ssh-add ~/.ssh/id_rsa_archerum and once you’ve added the key run ssh login.archer2.ac.uk to check you are not prompted for any input and get the expected PTY allocation.... response.
Many thanks, I added my id_rsa_archerum and when I then ran ssh login.archer2.ac.uk I didn’t have to put in anything else.
However, my suite has failed on atmos_main with a large number of backtrace errors which I haven’t seen before and I’m not sure how to resolve them. The equivalent of this suite runs fine on Monsoon so I suspect it is an issue with how I have converted it to run on Archer2.
For example:
[380] exceptions: [backtrace]: ( 14) : _start in file /home/abuild/rpmbuild/BUILD/glibc-2.26/csu/…/sysdeps/x86_64/start.S line 122
One immediate thing that I’ve seen is that you’ve asked for 8 nodes and 64 tasks per node, but then ntasks is 504 (8*64=512). This may not be the issue, but worth changing.
Thank you for looking into this. I can’t think of a reason why this would be causing a problem (aside from compiler issues) as this branch works in Monsoon.
I’m a bit confused regarding the changes you have made. I can find the SUBROUTINE code block in ukca_main1-ukca_main1.F90 and I assume the lines starting with TYPE and USE in your response are additions. Are these additions added directly after the SUBROUTINE block? If you have diff of your changes to the branch I could tell from that.
Thank you, I’ve made copied over your ukca_main1-ukca_main1.F90 changes (except the WRITE statements) to my branch. However, when I run u-cl073 I’m afraid I now get a different error. This looks a bit like one of the “known failure point” errors but I’m not certain. Have you seen this before?
[1] ???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
[1] ? Error code: 2
[1] ? Error from routine: GLUE_CONV_6A
[1] ? Error message: Deep conv went to model top at point 20 in seg 2 on call 1
[1] ? Error from processor: 356
[1] ? Error number: 86
That was the error I got when I mangled the start file - please check that you haven’t done the same.
Output from my run is /home/n02/n02/grenvill/cylc-run/cl073.
I reran with a clean dump file (cc298a.da20100101_00_v3 copied over from Jasmin). I checked a few fields using xconv after the run and they look ok. I’m a bit confused - is the corruption of the dump file a separate issue to that which you solved with the modifications to ukca_main1-ukca_main1.F90? If so, are there additional changes I need to make to my suite or branch.
Sorry, I think I’m misunderstanding something. I also get the GLUE_CONV_6A error when I run with cc298a.da20100101_00_cp. Do I need to do something to the dump file in advance of running? Otherwise, I’m not certain what I’m doing wrong as I think I have the same branch changes and suite setup as you.