Using correct project account code

Hi,

I’ve just started using Archer2 and my PI (Maria Val Martin) has a project account code n02-MRT019867. I my test suite (u-cj129) has “Account group for HPC tasks” set to n02 - is there anywhere else where Maria’s specific account code needs to be entered?

Many thanks for your help,

James

change n02 to n02-MRT019867

Great, thank you.

James

James

please see http://cms.ncas.ac.uk/wiki/Archer2/SshAgentSetup – you will need to set up the archerum key to submit to ARCHER2 from PUMA

Grenville

Thanks for this Grenville. I have completed Steps 1-4 and will do Step 5 in a couple of days once Archerum key has been installed on Archer2.

James

Hi Grenville,

I ran the ssh command for Step 5 and the output is shown below. (I have had to break it into 2 messages due to the limit on the number of links a new user can include in a single message.)

Warning: the ECDSA host key for ‘login.archer2.ac.uk’ differs from the key for the IP address ‘193.62.216.43’
Offending key for IP in /home/jmw240/.ssh/known_hosts:1
Matching host key in /home/jmw240/.ssh/known_hosts:10

(continued from previous)

Are you sure you want to continue connecting (yes/no)? yes
Connection to login.archer2.ac.uk closed by remote host.
Connection to login.archer2.ac.uk closed.

While I was not prompted for my passphrase, the output looks quite different to that on the SshAgent setup documentation webpage. Does this suggest the Archerum key has not been installed yet or could it be a different issue?

Thanks for your help

James

Hi James,

The address for the ARCHER2 4-cabinet system changed recently and I realise we’ve not updated those instructions. So you need to use ssh login-4c.archer2.ac.uk until the full system comes online.

If when you try that it complains about offending keys, then you need to edit the known_hosts file and delete the line indicated.

I have sent your id_rsa_archerum key to ARCHER2 but they haven’t actioned it yet. I’ll let you know when they’ve installed your key.

Regards,
Ros.

Hi Ros,

Thanks for this. Just to confirm in Step 3 I have changed “login” to “login-4c” in my ~/.ssh/config file.

When I then run ssh login-4c.archer2.ac.uk, I get:

PTY allocation request failed on channel 0
*Comand rejected by policy. Not in authorised list *
Connection to login-4c.archer2.ac.uk closed.

This appears to be the same as the output on the webpage. As there is no longer any reference to “offending keys”, I assume I don’t need to change the known_hosts file?

I will wait to hear from you re installation of the Archerum key. Once installation is complete, should I run the ssh command again?

Best,

James

Hi James,

That’s all absolutely fine. I’ve literally just heard from ARCHER2 that you key has been installed. So you are all setup and good to go.

Cheers,
Ros.

Hi Ros,

Thank you, this is very helpful. All parts of my test run (u-cj196) worked except for Rose_arch_wallclock which failed with:

Unloading /usr/local/share/epcc-module/epcc-module-loader

Warning: Unloading the epcc-setup-env module will stop many
modules being available on the system. If you do this by
accident, you can recover the situation with the command:

module load /work/y07/shared/archer2-modules/modulefiles-cse/epcc-setup-env

Unloading /work/y07/shared/archer2-modules/modulefiles-cse/epcc-setup-env
Unloading bolt/0.7
Loading bolt/0.7
Loading /work/y07/shared/archer2-modules/modulefiles-cse/epcc-setup-env
Loading cray-hdf5/1.12.0.2
Loading cray-netcdf/4.7.4.2
Task not supported at NCAS
Received signal ERR
cylc (scheduler - 2021-11-08T09:42:08Z): CRITICAL Task job script received signal ERR at 2021-11-08T09:42:08Z
cylc (scheduler - 2021-11-08T09:42:08Z): CRITICAL failed at 2021-11-08T09:42:08Z

Would you be able to advise how to resolve this?

On another note, my job sat as submitted on fcm_make2_um, recon and then atmos_main for a much longer time (>12 hours) than when I have run on Monsoon, particularly for fcm_make2_um. Is there anything I can do to reduce this?

Cheers,

James

Hi James,

Just switch off the “Archive UM Wallclock times” task in panel “suite conf → tasks”

ARCHER2 is only a 4 cabinet system currently and lots of people are using it, unfortunately a 12 hour wait in the queues is not a surprise, some people are waiting far longer than this for jobs to start running. This will obviously improve when the full system comes online.

If you need to do the compile again I would stick that in the short queue (max wallclock 20mins) and then switch to the standard queue for the atmos_main tasks.

Cheers,
Ros.

Great, thanks Ros and no problem, I was just concerned I had set my job in a bad way.

What’s the best way to run the compile in the short queue and then atmos_main the standard queue? I know how to switch queues but I’m not sure how to vary it for just one component of the run?

Thanks,

James

Hi Ros,

I selected the short queue and set wallclock time for PT10M (I just want to see if u-cj212 runs) but fcm_make2_um is still going to “submit-failed” with the following message:

(login-4c.archer2.ac.uk) 2021-11-09T16:16:52Z [STDERR] sbatch: error: QOSMaxWallDurationPerJobLimit
(login-4c.archer2.ac.uk) 2021-11-09T16:16:52Z [STDERR] sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user’s size and/or time limits)
[((‘event-mail’, ‘submission failed’), 1) ret_code] 0

Do I need to change another wallclock variable?

Cheers,

James

Please change permissions on your ARCHER2 home and work spaces - we need read access to the log files

Grenville

Hi Grenville,

I’ve changed the permissions.

Thanks,

James

Hi James,

In site/archer2.rc in [[UMBUILD_RESOURCE]] change execution time limit = PT20M

Regards,
Ros.

Splitting out to new topic as off original subject.

4 posts were split to a new topic: Upgrading suite to vn12.0