Submit-failed

Hi Ros,

I find that the following OMP related changes (shown in bold font) in the suite u-cq224 helps to avoid the segmentation fault I reported earlier in this thread.

  1. In the rose-suite.conf

!!MAIN_OMPTHR_ATM=2
MAIN_OMPTHR_ATM=10

  1. In the archer2.rc

[[ATMOS_RESOURCE]]
inherit = UM_PARALLEL, SUBMIT_RETRIES
[[[directives]]]
–nodes={{NODE_ATM}}
–ntasks= {{TASKS_ATM}}
–tasks-per-node={{NUMA*(TPNUMA_ATM|int)}}
–cpus-per-task={{MAIN_OMPTHR_ATM}}
[[[environment]]]
OMP_NUM_THREADS={{MAIN_OMPTHR_ATM}}
ROSE_LAUNCHER_PREOPTS = {{ATM_SLURM_FLAGS}}
OMP_STACKSIZE = 20Gb
[[[job]]]
execution time limit = {{MAIN_CLOCK}}

I haven’t fully understood why this helps to solve the issue. Also, I am not sure if this is the best optimisation possible. Please could you help me optimise this OMP related settings further. Also I am doubting if this new OMP configuration is taking long time to complete the job.

Cheers,
Timmy

Hi Ros,

From today morning, I am getting the following error while submitting any of the suites which were running earlier in my area. Is this some issue with my account or is there something wrong with pumatest today ?

Cheers,
Timmy

[INFO] install: suite.rc

[INFO] REGISTERED u-cq224 → /home/eartfr/cylc-run/u-cq224

[INFO] create: share

[INFO] create: share/cycle

[INFO] create: work

[FAIL] ssh -oBatchMode=yes -n tfrancis@login.archer2.ac.uk env\ ROSE_VERSION=2019.01.3\ CYLC_VERSION=7.8.7\ bash\ -l\ -c\ ‘“$0”\ “$@”’\ rose\ suite-run\ -vv\ -n\ u-cq224\ –new\ –run=run\ –remote=uuid=a7aae907-76a5-44c4-848a-716b11028ca3,now-str=20220902T111035Z,root-dir=‘$DATADIR’ # return-code=255, stderr=

[FAIL] Permission denied (publickey).

Your ssh agent has probably stopped

what do you get on pumatest for
ssh-add -l

Hi Grenville,

I get the following message:

-bash-4.1$ ssh-add -l
Could not open a connection to your authentication agent.

Cheers,
Timmy

Tummy

On pumtaest, delete ~/.ssh/environment.pumatest.nerc.ac.uk, log out, then login and add the arceherum key to the ssh agent

ssh-add ~/.ssh/id_rsa_archerum

that should do it.

Grenvile

Hi Grenvile,

Yes, the ssh agent is working fine now.

Cheers
Timmy