rose-jules-run: command not found

Hello,

I am working with JULES on Jasmin (suite u-an231). I started a run last week and it was working fine, but for some reason it stopped working at spinup18. I didn’t change anything while it was running so I don’t understand what could have gone wrong. I tried restarting the jobs that failed using cylc reset --state=waiting . That didn’t work. I also tried stopping the suite and restarting the run from where it failed (with first run=false), but it didn’t work either.

The error message I get:
/bin/sh: rose-jules-run: command not found
[FAIL] rose-jules-run <<‘STDIN
[FAIL]
[FAIL] ‘STDIN’ # return-code=127
2022-04-05T08:01:35Z CRITICAL - failed/EXIT

Could you help me figure out what is going wrong?

Many thanks,

Thanks, Elise:
I can try to help.
What is your JASMIN username?

Have you already done chmod -R g+rX on your home directory, so I can see your files?
If not, can you do that? If there is anything private or confidential, you might consider moving it to a directory that wouldn’t then be readable by everyone in the group.
Patrick

Hi Patrick,
I just did it. And my jasmin user name is elisedhn.
Thanks,
Elise

Thanks, Elise:
I am looking now.
I am looking at ~elisedhn/cylc-run/u-an231/log/job/1/, and I see a bunch of tasks for different locations in there, with the suffix of _spin_01. Which location was it? I thought you had already done 18 spin-up cycles?
Patrick

Hi Elise:
Did you run the suite from the cylc1 Virtual Machine on JASMIN? Or did you run it on sci1?
Patrick

Hi Elise:
I think you need to run the fcm_make app. In your last run, you had the BUILD turned off. And I don’t see anything right now in your ~elisedhn/cylc-run/u-an231/share/fcm_make/build/bin/ directory.

If you look at the file ~elisedhn/cylc-run/u-an231/log/job/1/amacayacu_standard_spin_01/01/job, it says that it is using the command rose task-run --path=share/fcm_make/build/bin, so that path is where I looked for scripts or binaries, but I didn’t see any.

I am trying to run your suite now as ~pmcguire/roses/u-an231_elisedhn2 from the cylc1 VM. It is waiting to BUILD in the SLURM short-serial queue. I guess it might take awhile right now to get through the queue.
Patrick

Hi, none of the locations worked (there are 7 locations and 5 types of runs for each). Yes, at first I started running with 20spin, and it failed on the 18th. I tried to restart each failed individual job but that didn’t work. So, I stopped the suite and tried restarting the whole suite with 3spinups (to get to 20 in total) and starting from the outputs of the 17th spin.

I’m running it on cylc1.

Hi Elise:
Did you try running it with BUILD turned on? How did that work?
Patrick

Hi Elise
When I ran your suite as ~pmcguire/roses/u-an231_elisedhn2 with BUILD enabled, the fcm_make app worked fine, and the spinup cycle1 was able to complete for all but one of the locations. I just retriggered the app that failed during the original submission, congo_dynamical_spin_01. Maybe the suite will then progress further. You can see the log files in ~pmcguire/cylc-run/u-an231_elisedhn2
Patrick

Hi,
I tried with build=true, but it does not work. I get another error message though:


I don’t understand why it should need a C20C (ie for the 20th) for the spinup
Thanks,
Elise

Hi Elise:
Congrats on fixing the first problem!

About the new problem, your suite.rc file defines the SPIN_INITFILE as containing the c20c string:
SPIN_INITFILE = {{ site }}/{{ site }}_{{ perturb|lower }}_c20c.dump.{{ C20C_RUNTIME_START }}0101.0.nc

Patrick

Hi Elise
Is it working better now?
Patrick

Hi Patrick,
Yes, the second problem was to do with how the restart was set up, and I worked around it by renaming some files.
Thanks for your help,
Elise

Hi Elise:
Great! I am glad you got it working properly. Congrats!
Patrick