Ssh was working and has since stopped

Hi,

I have not run on puma2 for about 2 months. When I tried to run a previously working suite, I have been getting a series of errors. First, it was similar to the error found here about
“modtime1 = os.stat(tmpfile).st_mtime
TypeError: coercing to Unicode: need string or buffer, int found”

So I started digging into it, and I can see my ssh-agent was not working. I followed the restarting agent step here but that did not fix it 11. Appendix B: SSH FAQs — NCAS Unified Model Introduction

When I looked in my .ssh on archer2, my ssh files were missing. I have added them again. No idea why they were not there. I checked the steps on setting up your ssh agent and everything looks okay.

When I run the suite on puma 2 I am getting the error

[FAIL] bash -ec H=$(rose\ host-select\ archer2);\ echo\ $H # return-code=1, stderr=
[FAIL] [WARN] ln02: (ssh failed)
[FAIL] [WARN] ln04: (ssh failed)
[FAIL] [WARN] ln01: (ssh failed)
[FAIL] [WARN] ln03: (ssh failed)
[FAIL] [FAIL] No hosts selected.

So I can tell that puma2 can’t connect to archer2 to submit the job. Any idea what I broke while trying to fix my initial problem?

Penny

Hi Penny,

What do you get when you run ssh ln03 on the PUMA2 command line?

Regards,
Ros.

Hi Ros,
Thanks for picking this up. Much appreciated!

[penmaher@puma2 u-dm681]$ ssh ln03
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: UNPROTECTED PRIVATE KEY FILE! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0640 for ‘/home/n02/n02/penmaher/.ssh/id_rsa_archer’ are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
Load key “/home/n02/n02/penmaher/.ssh/id_rsa_archer”: bad permissions
penmaher@ln03’s password:

it then does not accept the password for the file that I know is correct.

Hi Penny,

It won’t accept the passphrase because the file permissions of your keys are too open, it then defaults to asking for you ARCHER2 password. Please change the permission on your ~/.ssh directory and files:

puma2 $ chmod g-r ~/.ssh

Also make sure that the private keys within the .ssh directory are 600 pemissions (rw to only you).

Regards,
Ros

I have updated the permission. When I now type ssh ln03 it asks for my pw, accepts it and logs into archer2

That’s great. All I think you need to do now is add the key to your ssh-agent.

puma2 $ ssh-add ~/.ssh/id_rsa_archer

Then you should be able to run ssh ln03 and not be prompted for either passphrase or password.

Regards,
Ros.

Brilliant. Yes that fixed the first SSH issue.

I am now back to my original problem of running a suite, having it fail to run with the following error, which I believe is still a SSH error:

Traceback (most recent call last):
File “/home4/home/n02-puma/fcm/metomi/cylc-7.8.12/bin/cylc-cat-log”, line 439, in
main()
File “/home4/home/n02-puma/fcm/metomi/cylc-7.8.12/bin/cylc-cat-log”, line 435, in main
tmpfile_edit(out, options.geditor)
File “/home4/home/n02-puma/fcm/metomi/cylc-7.8.12/bin/cylc-cat-log”, line 265, in tmpfile_edit
modtime1 = os.stat(tmpfile).st_mtime
TypeError: coercing to Unicode: need string or buffer, int found

Hi Penny,

In e.g. ~/cylc-run/u-dm681/log/job/20150101T0000Z/fcm_make2_um/04/job-activity.log there is an error regarding the SLURM settings.

(ln03) 2025-05-02T13:55:58Z [STDERR] sbatch: error: AssocMaxCpuMinutesPerJobLimit
(ln03) 2025-05-02T13:55:58Z [STDERR] sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)

There is no HPC resource for n02-NEW005875. We moved into a new accounting year on 1st April 2025 and there was no HPC resource application submitted for the GreenBlock project for Apr25-Mar26.

Regards,
Ros.