Thanks for the help and explanations! Much appreciated. Let me know what you find out from JASMIN support.
Hi again, Simon:
I just read your reply again. I think you’re referring to the
exec program that is used by the rose/cylc suite to run JULES (
exec rose mpi-launch -v jules.exe). At first reading, I parsed ‘Intel built exec’ as ‘Intel built executable’. I understand a bit better now.
I did mean the jules.exe but “exec”. I think I’ve tracked the issue back
to its source.
Both OpenMPI versions are built with easybuild. The respective dirs are:
I did a grep for “-xHost”, which causes the Intel compiler to use the
of the build processor, in the build logs. It appears in the Intel
version, but not the gcc
version. So the jasmin provided MPI is bespoke for Intel architectures
to being built with the “-xHost” switch on an Intel system. No
commands (mpirun, mpif90, mpicc…) work on AMD machines. They need to
Intel compiled software stack without “-xHost”
Thanks for the clarification, and for figuring out the build problem. I presume that you have passed the build problem on to the JASMIN team. Please do let me know when they fix it, ok?
Any news from the JASMIN team about this?
This arrived whilst I was on leave:
Thank you for the update
I very much appreciate your time and effort to investigate the issue and
identify the root cause of the problem with the Intel MPI on the AMD node.
We did not realise that the MPI application was built in the Intel
compiler environment and is limited to the Intel processor nodes.
Both versions of MPI |eb/OpenMPI/intel/3.1.1| and
eb/OpenMPI/intel/4.1.0 |need to be recompiled without the |-xHost|
flag. I will escalate the issue and update you when I can.
Some background info that might have hindered identifying the issue
Previously, LOTUS compute nodes were of Intel node type with different
Intel processor models. This necessitated defining host groups which
were used to homogeneously specify nodes for MPI parallel jobs in
When the new node type AMD was introduced to LOTUS and some of the old
Intel node types were gradually removed and retired, the uptake for AMD
was still very low. Users continued to use Intel and simply updated the
The JASMIN infrastructure team checked any likely compatibility issues
before buying the AMD nodes and was informed that there would be no
compatibility issues. Code compiled on an Intel node will run fine on
AMD hosts unless it was explicitly compiled with MMX instructions sets.
Hopefully it won’t be too much longer before it is fixed.
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.
This is this message that Simon Wilson forwarded to me from the JASMIN Helpdesk on July 11:
A new version of the OpenMPI library was compiled with Intel compiler
version 20.0.0 and without -x host flag.
The corresponding (hidden) module file is |eb/OpenMPI/intel/4.1.5 |:
module add eb/OpenMPI/intel/4.1.5
Could you please test JULES against this Intel OpenMPI?
Many thanks. I will try it out with AMD as soon as I can.
I tried it in ~pmcguire/roses/u-al752AMD,
but I get an “error parsing data file mpif90: not found” when I am in the fcm_make phase.
I tried again with a new copy of the suite. The new copy is
But I get the same error message. The complete error message is below. I don’t have permission to
read the file
I will also inform the JASMIN helpdesk about this.
mpif90 -oo/timestep_mod.o -c -DSCMA -DBL_DIAG_HACK -DINTEL_FORTRAN -I./include -I/gws/nopw/j04/jules/admin/netcdf/netcdf.openmpi//include -heap-arrays -fp-model precise -traceback /home/users/pmcguire/cylc-run/u-al752AMD3/share/fcm_make/preprocess/src/jules/src/control/standalone/var/timestep_mod.F90 # rc=243 [FAIL] Cannot open configuration file /home/users/cdelcano/openmpi/share/openmpi/mpif90-wrapper-data.txt [FAIL] Error parsing data file mpif90: Not found
From messages exchanged with the JASMIN Helpdesk:
These are modules necessary for a background build on the
cylc1 VM of JULES with the
mpif90 compiler. This allows compiling on INTEL nodes (i.e.,
cylc1) and running on AMD nodes (i.e., in the
module load intel/20.0.0 module load contrib/gnu/gcc/7.3.0 module load eb/OpenMPI/intel/4.1.5
The OpenMPI module above has now been properly built without node-type-specific instructions.
A JULES suite that successfully runs with these modules on the AMD nodes (in the short-serial-4hr queue) is in
~pmcguire/roses/u-al752AMD3. These changes have also been checked in to the parent-suite
u-al752, but that parent suite also needs to be ironed out a bit more, due to other updates.
The u-al752 JULES/FLUXNET suite has been updated and checked in, with the code to run JULES on the AMD nodes of the short-serial-4hr partition. It currently runs at JULES7.3 trunk. The plotting is done in Python and it needs 8 hours sometimes to plot, so this is in the short-serial queue, without an AMD constraint.
The docs for the u-al752 suite have been updated and are here:
This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.