I’m having an issue running a rose suite (Cylc 8 compatible suite) that compiles and runs JULES on the cylc2 node. It seems to to get stuck, right at the beginning of the process. The suite ID is u-dp235. The suite compiles JULES using FCM - this bit runs fine, and then runs a series of commands - a bit of python to process the input data for JULES, followed by JULES itself. But it fails every time. It doesn’t even seem to get to executing any of the python code or JULES. The log file just shows this:
Are you running suite u-dp235? I can’t see it in your cylc-run directory. There was one yesterday with the name u-de001???
Regardless - the easiest hack might be to take a copy of the job script which hangs, remove the slurm stuff, then hack it and run it on the virtual nodes of JASMIN: you can find which thing is hanging or which variable is unset with a quicker turnover than running the suite properly. Try to run it with a single process/rank if it’s a multi rank thing as this can cause issues.
Also I’d check that you can open the data files, as JASMIN has had issues with GWSs going down this week.
I hope that helps.
I’ve found the issue! It looks like the reason it was hanging was because I wasn’t specifying the platform to run JULES on. As such,I think it was defaulting to running on “localhost” which seems to have prevented it from getting anywhere. I’ve now set the platform to “lotus” along with the relevant partition, qos and account directives. Seems to be running now!