Good afternoon,
I am having a problem with my MOSRS password caching. The issue arose while trying to run the HadGEM3-GC5 suite u-cy010, and I can replicate it by entering on puma the command:
fcm info https://code.metoffice.gov.uk/svn/glosea/main/trunk/rose-stem/bin/scitool-run@5989
At which point I am prompted for my password (rather than cached password being used).
I tried to redo the steps in configuring the password caching (https://ncas-cms.github.io/um-training/getting-setup.html) but this did not fix the problem. Caching seemed to be working before the ARCHER2 maintenance downtime, as I had managed to perform some short model runs with HadGEM3-GC5 with no authentication problems. I have checked with a colleague who can successfully run the above command on their puma account without being prompted for a password.
Any help fixing this would be greatly appreciated.
Kind regards,
Tarkan
This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
Hi Tarkan,
- Can you please confirm that when you do enter your password when prompted you can access the glosea repository as below?
ros@pumanew$ fcm info https://code.metoffice.gov.uk/svn/glosea/main/trunk/rose-stem/bin/scitool-run@5989
Path: scitool-run
Name: scitool-run
URL: https://code.metoffice.gov.uk/svn/glosea/main/trunk/rose-stem/bin/scitool-run
Relative URL: ^/main/trunk/rose-stem/bin/scitool-run
Repository Root: https://code.metoffice.gov.uk/svn/glosea
Repository UUID: e14a8879-62c6-4f65-aad6-6a06efb4d672
Revision: 5989
Node Kind: file
Last Changed Author: joemancell
Last Changed Rev: 5989
Last Changed Date: 2023-06-20 10:35:03 +0100 (Tue, 20 Jun 2023)
- Do you have the same problem when you access the UM repository (e.g.
fcm info fcm:um.x-tr
)?
Regards,
Ros.
Hi Ros,
I cannot access the glosea repository, after putting in my password I get prompted again and again for my username and password, if I keep filling them in it just says the authentication failed after the third time:
-bash-4.2$ fcm info https://code.metoffice.gov.uk/svn/glosea/main/trunk/rose-stem/bin/scitool-run@5989
Authentication realm: https://code.metoffice.gov.uk:443 Met Office Code
Password for ‘tarkanbilge’: …
Authentication realm: https://code.metoffice.gov.uk:443 Met Office Code
Username: tarkanbilge
Password for ‘tarkanbilge’: …
Authentication realm: https://code.metoffice.gov.uk:443 Met Office Code
Username: tarkanbilge
Password for ‘tarkanbilge’: …
svn: E170013: Unable to connect to a repository at URL ‘https://code.metoffice.gov.uk/svn/glosea/main/trunk/rose-stem/bin/scitool-run’
svn: E215004: No more credentials or we tried too many times.
Authentication failed
[FAIL] svn info https://code.metoffice.gov.uk/svn/glosea/main/trunk/rose-stem/bin/scitool-run@5989 # rc=1
The UM repository, also asks for me to manually put in my password but this does then work:
-bash-4.2$ fcm info fcm:um.x-tr
Authentication realm: https://code.metoffice.gov.uk:443 Met Office Code
Password for ‘tarkanbilge’: …
Path: trunk
URL: https://code.metoffice.gov.uk/svn/um/main/trunk
Relative URL: ^/main/trunk
Repository Root: https://code.metoffice.gov.uk/svn/um
Repository UUID: 0462af51-e8cd-401e-9b71-8f0b910fc1c1
Revision: 119031
Node Kind: directory
Last Changed Author: jenniferhickson
Last Changed Rev: 119008
Last Changed Date: 2023-07-12 11:19:35 +0100 (Wed, 12 Jul 2023)
Many thanks,
Tarkan
This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
Hi Tarkan,
Looks like part of the issue is a repository permissions. Can you confirm that you can’t see this page on MOSRS?
https://code.metoffice.gov.uk/trac/glosea
I’ll then need to ask the Met Office to fix your account permissions.
With regard to the non-caching of the password on PUMA. Does your password contain any special characters like ^
or #
? There are some special characters that don’t play nicely with gpg-agent. !
and $
I know are fine.
The previous GC5 suite did not use code from the glosea repository so this problem wouldn’t have manifested then.
Regards,
Ros.
Hi Ros,
I can see this page:
My password doesn’t have any special characters like that, except a ‘_’ which I feel like should be fine.
Should I try to change my password just in case?
Kind regards,
Tarkan
This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
Hi Tarkan,
‘_’ should be fine. I presume you can see the source code via the MOSRS Glosea page too? “Browse Source” in the top right menu bar? If you can see that then I’m really confused why you can’t see it at all from PUMA, even if you do have to put your password in.
The only thing I can currently suggest re the caching on PUMA is to:
- Remove the file
~tarlge/.subversion/auth/svn.simple/2be6a67d04b1c8c6d879daafa52fd762
- Kill any existing gpg-agent process that is running.
ps -flu tarlge | grep gpg'. Use
kill -9 ` to kill them.
- Redo the MOSRS setup section in the training: 1. Getting Set Up (Self-Study Instructions) — NCAS Unified Model Introduction
I’ll also check with the Met Office to see if they can see anything funny at their end.
Regards,
Ros.
Hi Ros,
I tried these steps but with no luck. And yes, I can see the source code via the MOSRS Glosea page too.
It’s still not working through PUMA either. Would appreciate you checking in with the MO in case there is something on their side.
Many thanks,
Tarkan
This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
Hi Tarkan,
For now I suggest removing the following 2 lines from the rose-suite.conf
to avoid the glosea issue. The script it’s trying to pull is only used in the development tests which you won’t be running.
[file:bin/scitool-run]
source=https://code.metoffice.gov.uk/svn/glosea/main/trunk/rose-stem/bin/scitool-run@5989
Try running the suite again. It may work without you needing to cache your MOSRS password at all.
Just to check - did you successfully manage to cache the MOSRS password in the end on PUMA.
Regards,
Ros.
Hi Ros,
I’ll give this a try now and let you know how I get on.
I think the MOSRS is caching, I repeated all the steps for setting it up, and get this when I log in:
But it didn’t make any difference to running the ‘fcm info’ command, and I still get prompted for my password there.
Many thanks,
Tarkan
This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
Hi Ros,
Removing the two lines as you said produces a different error, but perhaps this is unrelated:
The Puma to Archer2 connection seems to be set up okay, but I’ll try to double check it again now.
Many thanks,
Tarkan
This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
Hi Tarkan,
The ARCHER2 host key changed following the OS upgrade which is the cause of the Host Key Verification error.
Run ssh login.archer2.ac.uk
from the PUMA command line and remove the offending line listed from your ~/.ssh/known_hosts
file. You may need to do this multiple times to identify all offending lines.
Regards
Ros.
Hi Ros,
Thanks for this, it started to run now that I updated the known_hosts - although recon/fcm_make2_um/coupled tasks failed after that.
I have a silly question too; once I have run the model and it crashes, is there a script to reset it to it’s initial state so I can run again? I couldn’t find it in documentation.
Best wishes,
Tarkan
This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
Hi Tarkan,
fcm_make2_um
has failed because it’s found a lock file present from a previous aborted compile. See the job.err file:
[FAIL] /work/n02/n02/tarlge/cylc-run/u-cy010/share/fcm_make_um/fcm-make2.lock: lock exists at the destination
On ARCHER2 remove this directory and then retrigger the failed fcm_make2_um
task (right click on it and select retrigger)
Since the UM build has failed recon
& coupled
tasks can’t run yet. Not sure why they would have run (maybe you manually triggered them…??) The suite knows the task dependencies and will run tasks when they are ready.
If a task (e.g. model) crashes, the task will turn red. Once you have taken remedial action to fix the problem it can be retrigger through the cylc GUI. If the fix required a change to the suite files on PUMA you will need to run rose suite-run --reload
first to reload the suite configuration.
Regards,
Ros.
Hi Ros,
Thanks for taking a look. Retriggering it after removing the lock did get further but got stuck on a later error:
[FAIL] ! CONGEST_CONV_MOD.mod: depends on failed target: congest_conv_mod.o
[FAIL] ! congest_conv_mod.o : update task failed
Looks like congest_conv_mod.o doesn’t exist, and there is instead a congest_conv_mod-6a.o (and congest_conv_mod-6a.F90).
Yes, sorry, I did play around with manually triggering the tasks, was just trying to understand order of tasks and dependencies, hopefully this doesn’t ruin the whole suite.
Regards,
Tarkan
This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
Hi Tarkan,
I’d suggest trying to retrigger it again (someone else had a false compile failure this morning in another suite which was fixed on the next try). If that still fails I’d suggest stopping the suite and doing a completely clean run with rose suite-run --new
Cheers,
Ros.
Hi Ros,
Thanks, retriggering allowed it to start running. The ‘coupled’ task started to run, but in the process, my /work/n02/n02/tarlge/ space on ARCHER2 reached capacity of 100G, and I guess this caused the run to crash. I tried to set up pptransfer to automatically transfer from ARCHER2 to JASMIN, but I guess that this works in the ‘postproc’ step which can’t be reached if my quota is exceeded in the ‘coupled’ task. Is this a common problem / is there anything I can do to fix it?
Cheers,
Tarkan
This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
Hi Tarkan,
I have increased your quota on ARCHER2 please try running again. Yes, the transfer occurs after the coupled/postprocessing step.
Regards,
Ros.
Hi Ros,
Just to let you know that the model is running successfully now, and I managed to complete a 4-month run, thank you so much for all your help so far!
Best wishes,
Tarkan
This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
2 posts were split to a new topic: Postproc failure