I’ve tried to re-run an old JULES rose suite using my original suite.rc file, but when running the fcm_make part, it complains that it can’t find the contrib/gnu/gcc/7.3.0 module. I did a quick check on JASMIN using “module avail” and noticed that this module no longer exists so I switched it for gcc/8.2.0 which does exist, however now I’m getting a load of fortran errors:
My hunch is that this has something to do with the various intel/compiler modules that I’m loading in under the [[JASMIN]] section of my suite.rc file (I’ve pasted this below). For some reason, these no longer work. I see there is now an intel.20.0.0 – maybe I need to use this? Or perhaps there is something else I need to change in my suite.rc file. Any ideas?
Many thanks for your help with this
Jon
[runtime]
[[root]]
script = rose task-run --verbose
[[[events]]]
mail events = submission failed, submission timeout, failed, timeout, succeeded
Hi Jonathan:
I see that you are having trouble loading the module contrib/gnu/gcc/7.3.0 , which is needed to compile JULES. I also have trouble with this, since this module isn’t there. So this is an issue that affects all JULES users on JASMIN. I checked, and at least according to JULES trunk v7.0 (/home/users/pmcguire/jules/jules-vn7.0/rose-stem/include/jasmin/runtime.rc ), this file is the one that is needed. I also checked, and I don’t see anything in /apps/jasmin/modulefiles/contrib for gnu/gcc or for gnu , if that’s where they’re supposed to be.
I have put this into the attention of a couple of colleagues in NCAS CMS. I have also emailed JASMIN support about this. We will keep you posted.
Patrick
Thanks Patrick - FYI my particular JULES rose-suite is configured to use JULES v6.0. I assme the need for gcc 7.3.0 is the same for this one (at least it used to compile when I had access to 7.3.0).
Also, just to note, I also tried using gcc 7.2.0 which does show as available in the current list of available modules on JASMIN. I got the same error as when using gcc 8.2.0
I’m pretty sure that the compiler install has an issue. For example if you try ‘module load eb/OpenMPI/intel/4.1.0’
then you should be able to use mpif90 - but you can’t because of issues of the type where it can’t find ‘libiomp5.so’ . I’ve had a quick look at where this stuff is, and for example:
‘ls -lht /apps/sw/eb/software/OpenMPI/4.1.0-iccifort-2018.3.222-GCC-7.3.0-2.30/lib64’
points to a dead link. So I think that perhaps JASMIN have been updating compilers or something and maybe the libraries being pointed to were deleted?
This is a bit of a guess - the gcc seems also to have gone, as you note.
Hopefully we can hear something soon, as this may be important for others (or I may be being daft… we’ll see).
Hi Jonathan & David
This is the response I just got from the CEDA JASMIN support email helpdesk:
"The module and the GNU software under contrib/gnu/gcc/7.3.0 were not migrated to the new partition area as we thought that they were redundant. All GNU compilers are now available via the JASPY environment.
Could [the user] try and build JULES using the GNU compiler provided by JASPY?
Is it a parallel version of JULES?
For example GNU Fortran (conda-forge gcc 12.1.0-16) 12.1.0 is available via the default JASPY environment. Earlier GNU versions are available from previous JASPY environments."
Hi Jonathan:
(responding to a CEDA JASMIN email support ticket):
You do need to make sure that you do this in your JULES suite prior to running the jules.exe:
When I tried to do what the CEDA JASMIN support person did, i.e., run the ldd on the jules.exe (without doing the module loads and exports above either in the suite (preferably) or before running the suite),
then I get the same error message about missing libraries that the CEDA JASMIN support person did. When I do the module loads and exports first before the ldd, everything seems fine.
Hi Jonathan:
What is the suite number that you’re working on? If this is still troubling you, I can look directly at the suite and maybe figure out what is going on.
Patrick
I’ve just committed some changes to the suite to include a small example input dataset so that it should be ready to run. The suite number is u-ck523. You’ll see on running it, it performs one fcm compile jonb to compile jules. It then runs the same jules model multiple times (with different climate inputs) so you’ll see for the JULES section of the code it submits >100 jobs. All of these fail because apparently they exceed the runtime. It never used to do this for this small example.
Hi Jon:
I am trying to run your suite u-ck523 now. The fcm_make task/app succeeded. But the first time that it tried to run the jules tasks/apps on the short-serial queue/partition, the submission failed. The job-activity.log (~pmcguire/cylc-run/u-ck523/log/job/1/jules_0/01/job-activity.log) said:
This was easy to trace down, since the job script (~/cylc-run/u-ck523/log/job/1/jules_0/01/job) had: #SBATCH --constraint=ivybridge128G|skylake348G|broadwell256G
It would be better to use --constraint="intel" for the jules app in your suite.rc file.
The 2nd time that I ran the jules app (after making this change and reloading the suite and retriggering the jules family of apps), I got this error in ~pmcguire/cylc-run/u-ck523/log/job/1/jules_130/02/job.err: /var/spool/slurmd/job21656806/slurm_script: line 93: /home/users/pmcguire/cylc-run/u-ck523/bin/prep_jules_clim_cmip_run.py: Permission denied
I fixed the permissions of that, and then I also got this permissions error in ~pmcguire/cylc-run/u-ck523/log/job/1/jules_132/02/job.err:
I don’t have access to the rahu GWS, so I can’t do much there. Should I request permission to access the rahu GWS? This processing with the rahu GWS is done before executing the jules.exe executable.
Patrick
Hi Jonathan:
Those extra 6 modules are coming from my .bash_profile doing a module add parallel-netcdf/intel. I think this is from a previous set-up. I am now trying to run a branch of the u-al752 suite without this setting in my .bash_profile. I am expecting that it will work like it was working before with this setting.
Patrick
Hi Jonathan:
I hacked my version of your u-ck523 suite (see: ~pmcguire/roses/u-ck523; I also made a copy of your u-ck523 with these mods, and it is checked in as u-cr771). The new version has several changes, including:
– skipping an extra module load jaspy in the script of the [[jules]] section, since there seems to already be a module load jaspy inherited from the [[JASMIN]] section.
– commenting out the script that uses the rahu GWS, since I don’t have access to that.
Maybe this version will be able to run the jules.exe (and maybe actually give a JULES error message).
Patrick
Hi Jonathan:
The hacked version of u-ck523 mentioned in the previous entry here does indeed run jules.exe. (see: ~pmcguire/roses/u-ck523 ; I also made a copy of your u-ck523 with these mods, and it is checked in as u-cr771 ). It also give proper JULES error messages when it fails, presumably since I am skipping prep_jules_clim_cmip_run.py $I_GLACIER $I_CMIP_RUN ${REGION_NAME}. I am skipping that because I don’t have access to the rahu GWS. When this Python script is not skipped, it will probably run further than it did for you previously.
I am currently doing a test run with everything the same (except not skipping the extra module load jaspy) in ~pmcguire/roses/u-cr771b. It looks like these jules app runs are getting stuck like they did for you, without the immediate failure like they did for me previously when I also skipped the module load jaspy (in my case, due to the skipping of using the data from the Rahuls GWS).
You can look at the log files in ~pmcguire/roses/u-cr771b, ~pmcguire/roses/u-cr771, and ~pmcguire/roses/u-ck523.
But I recommend getting rid of the extra module load jaspy. This was overriding previous module load’s that you made after your first module load jaspy.
Patrick
Hi again Jonathan:
I also note that my log file, wherein jules successfully partially runs (~/cylc-run/u-cr771/log/job/1/jules_0/01/job.out) has this mpirun path:
Thanks so much for the help with this. I uncommented the additional “module load jaspy” commands in my suite.rc and it’s working! Thanks so much for your help with this!