Trying out JULES Fluxnet u-al752

Hi, I’ve been following this tutorial for getting started with JULES and running u-al752 Tutorial for setting up Rose/Cylc in order to run JULES on CEDA JASMIN - Land Surface Processes Group
but I can’t get the suite to run (step 15 of the tutorial).

I get this message when I try to run the suite:

[FAIL] file:bin/parallelise.py=source=fcm:jules.x_br/pkg/karinawilliams/r6715_python_packages/share/parallelise.py@19283: bad or missing value

From looking on the helpdesk, I think I’m getting the same error Running JULES FLUXNET suite u-al752. Here, they suggested rose suite-run --new and it seemed to work for Richa, but it isn’t helping for me.

Can someone please help? Also happy to provide more information!

Thanks, Ayesha

(Also not sure if this will help):

Previously I tried to run the suite and got this error:
[FAIL] ssh -oBatchMode=yes -oConnectTimeout=10 -n postproc env\ ROSE_VERSION=2019.01.3\ CYLC_VERSION=7.9.6\ bash\ -l\ -c\ '"$0"\ "$@"'\ rose\ suite-run\ -vv\ -n\ u-al752\ –run=run\ –remote=uuid=9c3372d2-540e-41db-9b06-77a22aa4a4d9,now-str=20220624T084519Z # return-code=255, stderr=

[FAIL] ssh: Could not resolve hostname postproc: Name or service not known

(I’m running this through sci1 and then cylc1)
I then found this post on the helpdesk https://cms-helpdesk.ncas.ac.uk/t/could-not-resolve-hostname-jasmin/427/2, so I changed my ~/.ssh/config file to

Host *
ServerAliveInterval 30

Host jlogin1
Hostname login1.jasmin . ac . uk [there are no spaces here normally, I can’t post something here with more than 2 links]
User ash221
IdentityFile ~/.ssh/id_rsa_jasmin
ForwardAgent yes
ControlMaster auto
ControlPath /tmp/ssh-socket-%r@%h-%p
ControlPersist yes

Host xfer?
Hostname %h.jasmin .ac.uk [there are no spaces here normally, I can’t post something here with more than 2 links]
User ash221
User ash221
ForwardAgent yes

Host sci? cylc1
HostName %h.jasmin .ac.uk [there are no spaces here normally, I can’t post something here with more than 2 links]
User ash221

Host sci* cylc*
User ash221
IdentityFile ~/.ssh/id_rsa_jasmin
ForwardAgent yes
ProxyCommand ssh -Y jlogin1 -W %h:%p
ControlMaster auto
ControlPath /tmp/ssh-socket-%r@%h-%p
ControlPersist yes

Then the first error message (about resolving hostnames) is now gone but now I have the new error message (about bad/missing value).

Thanks, Ayesha

Hi Ayesha:
The ‘file:…’ ‘bad or missing value’ error that you’re getting suggests that you don’t have the access to the Met Office Science Repository System (MOSRS) configured properly.

You should have been prompted for your MOSRS password when you logged in to cylc1. If you weren’t prompted for your MOSRS password when you logged in, then you should make sure your MOSRS configuration is set up correctly by following the appropriate steps in the tutorial.

Sometimes, the MOSRS password caching times out or something, and the easiest thing to do is log out of cylc1, and then log back in and you will be prompted for your MOSRS password again. An alternative is to type mosrs-cache-password at the cylc1 command-line prompt.

Once you have your MOSRS password properly cached, then you can test this by typing this command at the cylc1 command-line prompt:
fcm export fcm:jules.x_br/pkg/karinawilliams/r6715_python_packages/share/parallelise.py@19283
This command will copy the parallelise.py file (REVISION 19283) from the MOSRS, and you can view or edit that file in your working directory, if you’d like.

Furthermore, since you’re getting an error about the postproc host, you can see with these commands: cd ~/roses/u-al752; grep -r postproc * that this involves the file ~/roses/u-al752/site/suite.rc.MONSOON. This suggests that in your ~/roses/u-al752/rose-suite.conf configuration file, your LOCATION is still set for the MONSOON supercomputer, whereas it should be set for the CEDA_JASMIN supercomputer. If you haven’t followed the tutorial and made that change yet, there are also probably other steps that you haven’t reached yet in the tutorial.
Patrick

Thank you Patrick!

When I type ‘mosrs-cache-password’ I get this:

Met Office Science Repository Service password:
Subversion password cached
Traceback (most recent call last):
File “/usr/lib64/python2.7/runpy.py”, line 162, in _run_module_as_main
main”, fname, loader, pkg_name)
File “/usr/lib64/python2.7/runpy.py”, line 72, in _run_code
exec code in run_globals
File “/apps/jasmin/metomi/rose-2019.01.3/lib/python/rosie/ws_client_cli.py”, line 25, in
from rosie.ws_client import (
File “/apps/jasmin/metomi/rose-2019.01.3/lib/python/rosie/ws_client.py”, line 36, in
from rosie.ws_client_auth import RosieWSClientAuthManager
File “/apps/jasmin/metomi/rose-2019.01.3/lib/python/rosie/ws_client_auth.py”, line 38, in
import gtk
File “/usr/lib64/python2.7/site-packages/gtk-2.0/gtk/init.py”, line 64, in
_init()
File “/usr/lib64/python2.7/site-packages/gtk-2.0/gtk/init.py”, line 52, in _init
_gtk.init_check()
RuntimeError: could not open display
Error: Unable to access Rosie with given password
Run “mosrs-cache-password” to try caching your password again

I know it’s the right password (I can log in elsewhere with it, and without caching the password I normally can access rose).

Thanks for the tip about copying the parallelise file to my working directory. I then changed the rose-suite.conf file so I was using that copy instead.

When I tried running the suite again, the same issue appeared as before but now for the fluxnet_evaluation file. So in the end I copied:
fluxnet_evaluation.py
jules.py
make_time_coord.py
parallelise.py
and now the suite appears to be running! I’m not sure if this is the best workaround - but hopefully it’ll work now!

Another quick question - xmessage doesn’t work for me roughly 70% of the time, and I can’t access the GUI. Sometimes it’ll work, and sometimes it won’t. I’ve tried logging back out and in again, and again sometimes it works, sometimes it doesn’t. I’m using a mac, and I’ve quit/restarted/played around with Xquartz and nothing has happened.

So, for example, now I’m running u-al752 but I have no way of checking the progress. I type ‘rose suite-scan’ and I can see it’s running, but ‘rose sgc’ does nothing and when I type ‘rose bush start’ it says:
Traceback (most recent call last):
File “/usr/lib64/python2.7/runpy.py”, line 162, in _run_module_as_main
main”, fname, loader, pkg_name)
File “/usr/lib64/python2.7/runpy.py”, line 72, in _run_code
exec code in run_globals
File “/apps/jasmin/metomi/rose-2019.01.3/lib/python/rose/bush.py”, line 22, in
import cherrypy
ImportError: No module named cherrypy

Could you please help with this as well?

Thanks,

Ayesha

Hi Ayesha
If you’re getting those errors when you type mosrs-cache-password on cylc1, maybe you should instead log out and log back in, and it should then automatically ask for your MOSRS password.

If you still get those same errors after logging out and logging back in, and after entering your MOSRS password, then maybe it’s an Xwindows issue. Are you logging in to cylc1 from login1 or from login2? Are you using ssh -AX everywhere or ssh -AY everywhere? You might try ssh -AY to the lower-security login2, and then ssh -AY to cylc1. This is not a permanent solution, since it’s better to use ssh -AX on login1.

I have never used rose bush start. I get the same error as you do for this. Maybe you need to do a rose suite-run --restart?

Does Xclock work for you from cylc1?

The solution of using fcm export for each of the files from the command line is a makeshift solution. I am glad it helps in the short term, but it is not the thing to do in the long-term.

Patrick

Hi Patrick,

I log into cylc1 from login1 and then sci1. I use -AX for all of them. When I log in I type in my password each time and that seems to be fine, but I can use mosrs-cache-password.

Thank you for the tip about login2. Rose sgc works now, but my fcm_make failed. When I go on job.err I get this:

[FAIL] config-file=/work/scratch-pw/ash221/cylc-run/u-al752/work/1/fcm_make/fcm-make.cfg:2
[FAIL] config-file= - https://code.metoffice.gov.uk/svn/jules/main/trunk/etc/fcm-make/make.cfg@21512
[FAIL] https://code.metoffice.gov.uk/svn/jules/main/trunk/etc/fcm-make/make.cfg@21512: cannot load config file
[FAIL] https://code.metoffice.gov.uk/svn/jules/main/trunk/etc/fcm-make/make.cfg@21512: not found
[FAIL] svn: E170013: Unable to connect to a repository at URL ‘https://code.metoffice.gov.uk/svn/jules/main/trunk/etc/fcm-make/make.cfg
[FAIL] svn: E215004: No more credentials or we tried too many times.
[FAIL] Authentication failed

[FAIL] fcm make -f /work/scratch-pw/ash221/cylc-run/u-al752/work/1/fcm_make/fcm-make.cfg -C /home/users/ash221/cylc-run/u-al752/share/fcm_make -j 4 # return-code=1
2022-06-27T11:38:11+01:00 CRITICAL - failed/EXIT

Could you please help me?

Thanks,

Ayesha

Hi Ayesha
If you have to type your ssh passphrase each time you ssh, then you don’t have your ssh set up properly. You should be able to ssh without typing in your passphrase each time.
This needs to be fixed.

For your fcm_make error, it looks like your MOSRS password caching is not set up properly. You should set this up so that you don’t need to type the mosrs-cache-password command at the command prompt after you log in.
Patrick

Hi Ayesha
And you said that you’re logging in to sci1. This will not work for MOSRS or for Rose/cylc. You will need to run the suite from cylc1.
Patrick