Hi Leif,
Looking at the logs I can see that atmos_main has been hit by the same problem on numerous occasions as well. e.g.
You may not be aware of it because atmos_main task has automatic retries on submit-failure.
20030801T0000Z/atmos_main:
01/ 02/ 03/ 04/ 05/ NN@
When you run
rose host-select archer2
on the puma2 command line does it select an archer2 node ok. Try multiple times.
At the moment ln01 is down for maintenance so if it picks that node it should try another. e..g
ros@puma2$ rose host-select archer2
[WARN] ln01: (ssh failed)
ln04
ln02, ln03 and ln04 should all work. Make sure you can ssh into these 3 nodes ok.
Is this something that has only started happening since PUMA2 was rebooted last week?
Regards,
Ros