JULES stopped running on JASMIN

Hi Patrick,

I’m having some issues in running the TAMSAT-ALERT soil moisture JULES suite on JASMIN. To check the job, I would log onto the cylc server and the MOSRS agent will ask for my MOSRS password, however this is no longer happening. As such, I cannot access the suite. I cannot see /apps/contrib/metomi/ either which I understand is where the MOSRS agent is called from.

Are you able to access MOSRS ok?

Any thoughts you might have would be very welcome.

Thanks very much,
Ross

Hi Ross

The MOSRS password works for me when I login to cylc1.

If you need to reenter it, you can try running mosrs-cache-password on cylc1 .

I use /apps/jasmin/metomi/bin and not /apps/contrib/metomi/

Patrick

Hi Patrick,

I did some further checks this morning and found the issue. In the end it was very simple, my home directory became full and so Rose could not write any log files. I’ve cleared up some space, restarted the job and it’s now working.

Thanks also for the heads up about '/apps/jasmin/metomi/bin’, I see this was also referred to in the JASMIN maintenance email this morning.

Cheers,
Ross

Hi Ross:

I am glad it is working.

Patrick

Hi Ross:
Some time ago, the JASMIN folks suggested that I switch my metomi path, for testing. It looks like the old metomi path doesn’t work anymore. I have now updated the metomi part of the documentation to reflect this, at https://code.metoffice.gov.uk/trac/jules/wiki/RoseJULESonJASMIN .
Patrick

Hi Patrick,

I was hoping you might be able to provide some help with another JULES-related issue on JASMIN. The TAMSAT-ALERT JULES run has stopped working again and I’m pretty sure it’s to do with the recent JASMIN updates as this is when it stopped working.

Initially, the job was failing because of contrib/gnu/gcc/7.3.0 being moved. Now that this has been restored, I’m now getting the following error:

/gws/nopw/j04/odanceo/epinnington/u-bx723_jules/build/bin/jules.exe: error while loading shared libraries: libnetcdff.so.6: cannot open shared object file: No such file or directory

The command that is being executed is:

/apps/jasmin/jaspy/miniconda_envs/jaspy3.7/m3-4.6.14/envs/jaspy3.7-m3-4.6.14-r20200606/bin/mpirun /gws/nopw/j04/odanceo/epinnington/u-bx723_jules/build/bin/jules.exe

The libnetcdff.so.6 library is being called as follows (this is my suite.rc file /home/users/rmaidment/roses/u-cq002/suite.rc):

export HDF5_LIBDIR=/home/users/siwilson/netcdf_par/3.1.1/intel.19.0.0/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HDF5_LIBDIR

And I’ve verified that the HDF5_LIBDIR path is available from LD_LIBRARY_PATH. As such, I’m not sure why libnetcdff.so.6 can no longer be found by the JULES executable. I suspect this is almost certainty related to the changes JASMIN introduced last week.

Any thoughts you might have on why the JULES job is failing would be very welcome.

Thanks very much,
Ross

Hi Ross:
I tried executing your jules executable on cylc1 after running these commands, and I don’t get the same problem with the netcdf libraries. Also, I strongly urge you to modify your suite so that it doesn’t use pre-builds, but instead rebuilds the code from within the suite. See u-al752 for example. You can also add a condition to skip the rebuild if it has been rebuilt recently.
Patrick

module load intel/19.0.0
module load contrib/gnu/gcc/7.3.0
module load eb/OpenMPI/intel/3.1.1
export NETCDF_FORTRAN_ROOT=/gws/nopw/j04/jules/admin/netcdf/local_nc_par/3.1.1/intel.19.0.0/
export NETCDF_ROOT=/gws/nopw/j04/jules/admin/netcdf/local_nc_par/3.1.1/intel.19.0.0/
export HDF5_LIBDIR=/gws/nopw/j04/jules/admin/netcdf/local_nc_par/3.1.1/intel.19.0.0/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HDF5_LIBDIR

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.