Trying to get a login-4c job to work on the new login nodes on ARCHER2

I am trying to change a suite which was previously working on ARCHER2 on the login-4c nodes (u-ch469) to work on the new full system.
I changed the host name in site/archer2.rc to login.archer2.ac.uk
and changed the .ssh/config file to
Host login*.archer2.ac.uk

I had some issues with the ssh connection from puma to ARCHER2 but they now seem resolved.

I then submitted the suite and got several error messages like:
Lmod has detected the following error: The following module(s) are unknown:
“epcc-job-env”
Suspecting that this was an issue with the site/archer.rc file I replaced it with an example file.
cp /home/um/archer2/archer2.rc_ukesm site/archer2.rc
and tried again.

The errors are now:
Lmod has detected the following error: The following module(s) are unknown:
“GC3-PrgEnv/2.0/2020.11.26”

Could you please advise what I need to change to get this to work.
Many thanks,
Andrew

Hi Andrew,

You need to update the Science Configuration Module name in your suite to: GC3-PrgEnv/2.0/2021.12.15

Regards,
Ros.

Great thanks, that worked fine.
Its now failing because it cannot access the files in my login-4c /work/n02/n02 space.
Do I need to copy everything over to the new filesystem? And if so how is the easiest way to do this?
Thanks
Andrew

Andrew

You should have received this message from ARCHER:

Dear ARCHER2 Users,

A reminder that the ARCHER2 4-Cabinet system will be removed from service today (Monday 10th January) at 12:00 GMT. After this time you will no longer be able to use the ARCHER2 4-Cabinet system.

Grenville

Hi Grenville,
Thanks for this, I managed to get the files moved over to the new system.

I’ve got this to compile and run.

But I have now got this error in the postproc_nemo task:
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=955971.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

Any idea what could have caused this?
Thanks
Andrew

Andrew

Try adding

[[[directives]]]
            --mem=25G

in the [POSTPROC_RESOURCE] section in archer2.rc

Grenville

That worked. And my suite now successfully runs on the full system.
Thanks
Andrew