Compilation frozen when extracting the source code

Hi,

Me and another user (dming) of JULES on JASMIN, have been experiencing that fcm_make last forever “running”.
We are trying to run u-cx502 suite but fcm_make app (running on background) is not succeeding, even though it is already running for about 30min and I tested it yesterday and worked fine.

The job.out file only says:
Suite : u-cx502
Task Job : 20240101T0000+0100/fcm_make/01 (try 1)
User@Host: dming@cylc1.jasmin.ac.uk
Currently Loaded Modulefiles:

  1. intel/cce/19.0.0 4) contrib/gnu/binutils/2.31
  2. intel/fce/19.0.0 5) contrib/gnu/gcc/7.3.0
  3. intel/19.0.0 6) eb/OpenMPI/intel/3.1.1
    LD_LIBRARY_PATH=/apps/sw/eb/software/OpenMPI/3.1.1-iccifort-2018.3.222-GCC-7.3.0-.30/lib:/apps/contrib/gnu/gcc/7.3.0/lib64:/apps/contrib/gnu/gcc/deps:/apps/sw/intel/2019//itac/2019.0.018/intel64/slib:/apps/sw/intel/2019/compilers_and_libraries_2019.0.117/linux/compiler/lib/intel64_lin:/apps/sw/intel/2019/compilers_and_libraries_2019.0.117/linux/ipp/lib/intel64:/apps/sw/intel/2019/compilers_and_libraries_2019.0.117/linux/compiler/lib/intel64_lin:/apps/sw/intel/2019/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64_lin:/apps/sw/intel/2019/compilers_and_libraries_2019.0.117/linux/tbb/lib/intel64/gcc4.7:/apps/sw/intel/2019/compilers_and_libraries_2019.0.117/linux/tbb/lib/intel64/gcc4.7:/apps/sw/intel/2019/debugger_2019/libipt/intel64/lib:/apps/sw/intel/2019/compilers_and_libraries_2019.0.117/linux/daal/lib/intel64_lin:/apps/sw/intel/2019/compilers_and_libraries_2019.0.117/linux/daal/…/tbb/lib/intel64_lin/gcc4.4
    LD_LIBRARY_PATH=/apps/sw/eb/software/OpenMPI/3.1.1-iccifort-2018.3.222-GCC-7.3.0-2.30/lib:/apps/contrib/gnu/gcc/7.3.0/lib64:/apps/contrib/gnu/gcc/deps:/apps/sw/intel/2019//itac/2019.0.018/intel64/slib:/apps/sw/intel/2019/compilers_and_libraries_2019.0.117/linux/compiler/lib/intel64_lin:/apps/sw/intel/2019/compilers_and_libraries_2019.0.117/linux/ipp/lib/intel64:/apps/sw/intel/2019/compilers_and_libraries_2019.0.117/linux/compiler/lib/intel64_lin:/apps/sw/intel/2019/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64_lin:/apps/sw/intel/2019/compilers_and_libraries_2019.0.117/linux/tbb/lib/intel64/gcc4.7:/apps/sw/intel/2019/compilers_and_libraries_2019.0.117/linux/tbb/lib/intel64/gcc4.7:/apps/sw/intel/2019/debugger_2019/libipt/intel64/lib:/apps/sw/intel/2019/compilers_and_libraries_2019.0.117/linux/daal/lib/intel64_lin:/apps/sw/intel/2019/compilers_and_libraries_2019.0.117/linux/daal/…/tbb/lib/intel64_lin/gcc4.4:/gws/nopw/j04/jules/admin/netcdf/netcdf_par/3.1.1/intel.19.0.0/lib

2023-07-26T14:34:23+01:00 INFO - started
[INFO] Configuration: /home/users/dming/cylc-run/u-cx502/app/fcm_make/
[INFO] file: rose-app.conf
[INFO] export PATH=/apps/jasmin/metomi/rose-2019.01.8/bin:/home/users/dming/cylc-run/u-cx502/bin:/home/users/dming/cylc-run/u-cx502/bin:/apps/jasmin/metomi/bin:/apps/jasmin/metomi/bin:/home/users/dming/cylc-run/u-cx502:/apps/jasmin/metomi/cylc-7.8.12/bin:/home/users/dming/cylc-run/u-cx502/bin:/apps/jasmin/metomi/bin:/apps/sw/eb/software/OpenMPI/3.1.1-iccifort-2018.3.222-GCC-7.3.0-2.30/bin:/apps/contrib/gnu/gcc/7.3.0/bin:/apps/contrib/gnu/binutils/2.31/bin:/apps/sw/intel/2019/intelpython3/bin:/apps/sw/intel/2019/advisor_2019.0.0.570901/bin64:/apps/sw/intel/2019/vtune_amplifier_2019.0.2.570779/bin64:/apps/sw/intel/2019/inspector_2019.0.0.569751/bin64:/apps/sw/intel/2019//itac/2019.0.018/intel64/bin:/apps/sw/intel/2019/clck/2019.0/bin/intel64:/apps/sw/intel/2019/compilers_and_libraries_2019.0.117/linux/bin/intel64:/apps/sw/intel/2019/debugger_2019/gdb/intel64/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/opt/puppetlabs/bin:/apps/sw/intel/2019//parallel_studio_xe_2019.0.045/bin:/usr/local/sbin:/usr/sbin:/apps/jasmin/metomi/bin:/home/users/dming/bin

[INFO] source: /home/users/dming/cylc-run/u-cx502/app/fcm_make/file/fcm-make.cfg
[INFO] install: fcm-make.cfg
[INFO] source: /home/users/dming/cylc-run/u-cx502/app/fcm_make/file/fcm-make.cfg
[init] make # 2023-07-26T13:34:41Z
[info] FCM 2021.05.0 (/apps/jasmin/metomi/fcm-2021.05.0)
[init] make config-parse # 2023-07-26T13:34:41Z
[info] config-file=/work/scratch-pw2/dming/cylc-run/u-x502/work/20240101T0000+0100/fcm_make/fcm-make.cfg
[info] config-file= - /gws/nopw/j04/uknetzero/r24142_add_red_sci_vn1.1/etc/fcm-make/make.cfg
[info] config-file= - - /gws/nopw/j04/uknetzero/r24142_add_red_sci_vn1.1/etc/fcm-make/platform/jasmin-lotus-intel.cfg
[info] config-file= - - - /gws/nopw/j04/uknetzero/r24142_add_red_sci_vn1.1/etc/fcm-make/platform/envars.cfg
[info] config-file= - - - /gws/nopw/j04/uknetzero/r24142_add_red_sci_vn1.1/etc/fcm-make/platform/load_settings.cfg
[info] config-file= - - - - /gws/nopw/j04/uknetzero/r24142_add_red_sci_vn1.1/etc/fcm-make/remote/local.cfg
[info] config-file= - - - - /gws/nopw/j04/uknetzero/r24142_add_red_sci_vn1.1/etc/fcm-make/compiler/intel.cfg
[info] config-file= - - - - /gws/nopw/j04/uknetzero/r24142_add_red_sci_vn1.1/etc/fcm-make/build/normal.cfg
[info] config-file= - - - - /gws/nopw/j04/uknetzero/r24142_add_red_sci_vn1.1/etc/fcm-make/omp/noomp.cfg
[info] config-file= - - - - /gws/nopw/j04/uknetzero/r24142_add_red_sci_vn1.1/etc/fcm-make/ncdf/netcdf.cfg
[info] config-file= - - - - /gws/nopw/j04/uknetzero/r24142_add_red_sci_vn1.1/etc/fcm-make/mpi/mpi.cfg
[done] make config-parse # 0.5s
[init] make dest-init # 2023-07-26T13:34:42Z
[info] dest=dming@cylc1.jasmin.ac.uk:/home/users/dming/cylc-run/u-cx502/share/fcm_make
[info] mode=new
[done] make dest-init # 0.1s
[init] make extract # 2023-07-26T13:34:42Z

If your suite generally works then it’s probably an issue with variable performance at JASMIN. If it’s stuck on the extract stage, then you can run things like fcm ls fcm:jules.x-tr to test the connection between jasmin and MOSRS.
Jasmin sent an email around this week saying that some of their connections weren’t working - specifically globus and so on, but it might affect this.

You can also just run the make job as a normal bash command on the front end (delete slurm stuff from job if you’re running on LOTUS). This might help you quickly debug.

If things have worked previously then permissions or jasmin connections are likely culprits.

Hi Dave,

Thank you for your advice.

Yes, connection between JASMIN and MOSRS works fine, the fcm ls fcm:jules.x-tr gave:

ModuleLeaders.txt
admin/
bin/
etc/
rose-meta/
rose-stem/
src/
utils/

The fcm_make is done in the background we are not using any slurm commands for the compilation.

regards
Carolina

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.