I’m trying to set up a global JULES simulation on JASMIN, and I got my suite (u-cj307) to build and run on JASMIN. However, I checked on its progress now after about two hours of runtime, and the suite doesn’t create any output. Many thanks for your help!
It looks like you have a limit of 24 tasks at once on your job, but you’re trying to run 40 tasks. Maybe this is the issue? Also, you got rid of the SLURM run constraint of '–constraint = “ivybridge128G” '. Maybe you are now trying to get it running on a different type of processor than it was compiled on? These are just some ideas.
In order to get it working, you might try running for a very short time (1 day?) and a very short wall clock time (15 minutes?), with a very few number of processors (1 processor maybe? or 2 processors?). That way it gets through the submission cycle much more quickly, and it should produce a complete set of output.
It also looks like your app/jules/rose-app.conf file is using mpirun.lotus instead of mpirun . Maybe that makes a difference?
Many thanks for your help! Those settings are still remnants of the suite I used as a template, I’ve corrected them now. I should have looked through the files more carefully, sorry about that!
I had already reduced the running time but not to that degree. I assumed if the suite got to that point, it’d run successfully. I’ve reduced the runtime now and submitted to the test queue with 1 node, but the result is the same. The suite is running but there’s no output. Does that mean it is stuck in a loop or something like that? Because there should at least be the initial dump file, shouldn’t it?
You can try running it from the suite in the background instead of on SLURM, for a very short job on cylc1. Then there is no queueing and also maybe it will work better. It could be that the libraries and modules that you are loading are not getting loaded properly. You can look at one of the NGC suites for examples of the module loading (in 2-3 separate files in the suite) and of the running in the background.
Thanks again for your help! I tried running the JULES task in the background, but the result is the same.
I’ve now checked out a copy of your suite u-as052, which is mentioned as the example suite on the Reading website. I probably should have looked for that first. The suite, u-cj369, runs successfully, so I’m trying to adapt it to the setup I need to run. That involves switching to the trunk version of JULES, but that causes fcm_make to fail because it can’t find make.cfg:
[FAIL] config-file= - https://code.metoffice.gov.uk/svn/jules/main/trunk/etc/fcm-make/make.cfg
[FAIL] https://code.metoffice.gov.uk/svn/jules/main/trunk/etc/fcm-make/make.cfg: cannot load config file
[FAIL] https://code.metoffice.gov.uk/svn/jules/main/trunk/etc/fcm-make/make.cfg: not found
[FAIL] svn: E215004: Authentication failed and interactive prompting is disabled; see the --force-interactive option
[FAIL] svn: E215004: Unable to connect to a repository at URL 'https://code.metoffice.gov.uk/svn/jules/main/trunk/etc/fcm-make/make.cfg'
[FAIL] svn: E215004: No more credentials or we tried too many times.
[FAIL] Authentication failed
[FAIL] fcm make -f /work/scratch-pw/mtodt/cylc-run/u-cj369/work/1/fcm_make/fcm-make.cfg -C /home/users/mtodt/cylc-run/u-cj369/share/fcm_make -j 4 # return-code=1
That URL exists, though. I’ve switched to a JULES version I had checked out on JASMIN (v6.0), which works for fcm_make. I’ll keep this ticket updated on the progress and issues that might/will pop up.
The failure results from a connection to MOSRS issue - it worked for me:
glister@cylc1$ fcm ls https://code.metoffice.gov.uk/svn/jules/main/trunk/etc/fcm-make/make.cfg
Please check your MOSRS connection.