Help with SIGBUS or Bus error while running multiple JULES models on JASMINE

Assumpta · 16 April 2026 15:32

I am currently running JULES on JASMIN using the batch system.

When submitting only one model job on the batch system, the model runs fine without throwing any error. But when I submit more than two model jobs, then one or more models fail with the error below:

"

Program received signal SIGBUS: Access to an undefined portion of a memory object.

Backtrace for this error:

Could not print backtrace: /proc/self/exe, errno: 116
/var/spool/slurmd/job15941950/slurm_script: line 67: 1969851 Bus error (core dumped) $JULES_ROOT/build/bin/jules.exe $NAMELIST

"

I need assistance in resolving this issue as I need to run multiple models simultaneously. Thank you.

david.livings · 17 April 2026 15:00

Hi Assumpta

Please let me know which commands you are using to run JULES.

David

Assumpta · 20 April 2026 08:57

Hi David

Thank you for your response.

I am using this command to run JULES - ‘$JULES_ROOT/build/bin/jules.exe $NAMELIST’

Please see below the content of my batch script:

#!/bin/bash
#SBATCH --partition=standard
#SBATCH --qos=long
#SBATCH --account=tesnbsclim
#SBATCH -o %j.out
#SBATCH -e %j.err

#SBATCH --ntasks=1

#SBATCH --time=4-23:59:59

#SBATCH --mem=120G

#SBATCH --mail-user=assumpta.onyeagoziri@uct.ac.za
#SBATCH --mail-type=ALL

echo "Date & time: "date

module load jaspy

export RSUITE=$HOME/cylc-src/u-dy129

export tmp1=grep -ir "JULES_SOURCE" $RSUITE/app/fcm_make/rose-app.conf
export JULES_ROOT=echo ${tmp1##*=}
unset tmp1
export tmp1=grep -ir "output_dir" $RSUITE/app/jules/rose-app.conf
export OUTPUT_DIR=echo ${tmp1##*=} | sed "s/'//g"
unset tmp1
export tmp1=grep -ir "run_id" $RSUITE/app/jules/rose-app.conf
export RUN_ID=echo ${tmp1##*=} |sed "s/'//g"
unset tmp1
echo ‘Rose suite is:’ $RSUITE;echo ’ which uses this version of JULES:’ $JULES_ROOT;echo ’ and will save output to these files: ls’ $OUTPUT_DIR’/‘$RUN_ID’’
geany &
nedit~/out_${RSUITE##/}.txt &
export NAMELIST=$HOME/cylc-src/nlists_${RSUITE##*/}; mkdir -p $NAMELIST; cd $NAMELIST; rose app-run -i -C $RSUITE/app/jules;
cd ~
cd $JULES_ROOT;fcm make -j 2 -f etc/fcm-make/make.cfg --new; echo -e “\a”
$JULES_ROOT/build/bin/jules.exe $NAMELIST

echo "All done at: "date

Thanks

Assumpta

david.livings · 20 April 2026 11:26

Hi Assumpta

The script seems to have become garbled in the copying and pasting. Can you upload the file or point me towards a copy on JASMIN?

David

Assumpta · 29 April 2026 08:25

Thanks David. Please see the file here on jasmin:

/home/users/assumpta/tesnbs.sh

david.livings · 30 April 2026 13:03

Hi Assumpta

I’m investigating and have ruled out a couple of possibilities.

Am I right in thinking that the log files under /home/users/assumpta/batch_sripts/tesnbsruns come from jobs that have failed in this way and that they were run from the batch scripts in the same directory? Do you have log files from a successful run somewhere for comparison?

David

david.livings · 30 April 2026 16:14

Hi Assumpta

I haven’t been able to see where things are going wrong from the log files. I think you will need to run JULES with the debug options turned on so that you can see where the error is when it crashes. To turn the debug options on you will need to edit /home/users/assumpta/MODELS/vn7.7_cmfz/etc/fcm-make/platform/jasmin-gcc-nompi_new.cfg and change the line

$JULES_BUILD = normal

to

$JULES_BUILD = debug

Good luck
David

Assumpta · 6 May 2026 07:49

Hi David
Thank you for your message.
Yes the log files in my /home/users/assumpta/batch_scripts/tesnbsruns come from jobs that are successful and the ones that failed.
See an example in the same directory for the successful one: 20795926.err and 20795926.out

Thank you.

david.livings · 7 May 2026 17:27

Hi Assumpta

Despite the successful runs, it appears that you are still getting failures. Are you going to try turning on the debug options as I suggested?

David

Assumpta · 13 May 2026 13:04

Hi David,

Thanks for your response.

I have tried your suggestion by replacing ‘normal’ with ‘debug’ in the jasmin-gcc-nompi_new.cfg file and now I see a different type of error - ‘SIGFPE: Floating-point exception - erroneous arithmetic operation’, which is different from the SIGBUS error I was battling with previously.
Please find the error file here: /home/users/assumpta/batch_sripts/tesnbsruns/22851420.err

It is quite surprising that the model runs fine without throwing any of these errors when I only run one or two models at a time. But when I run more than two models, then the errors start occurring. Thanks for your assistance in this matter.

Assumpta.

david.livings · 13 May 2026 16:59

Hi Assumpta

The different error would be due to the debug options turning on more trapping of floating point errors.

From the backtrace at the end of 22851420.err, it appears that the error is occurring in the calculation of litter_flux at lines 94-95 of the preprocessed source file /home/users/assumpta/MODELS/vn7.7_cmfz/preprocess/src/jules/src/control/shared/calc_litter_flux_mod.F90.

Since the error appears to occur at random, I suspect that it’s due to one of the variables in the calculation not being initialised, or to one of these variables being ultimately calculated from a variable that wasn’t initialised. Most of the time whatever was in the uninitialised variable will lead to a valid calculation, although the result might be nonsense. Occasionally it will lead to an invalid calculation (like 1.0/0.0).

It might just be a coincidence that you haven’t yet seen an error when running only one or two models. After all, the more models you run, the more likely it is that one of them will suffer a random error.

Suggestions for what to do next:

Gather more information on where the bug is. I notice that you have two other jobs still running. If these fail, it will be useful to know where they fail. You could also try rerunning the job that led to 22851420.err exactly as before to see whether it fails in the same place.
Look through the code and see whether you can spot a variable that is being used uninitialised. The problem might be in the aforementioned calculation in calc_litter_flux_mod.F90 or it might be in one of the other files listed in the backtrace.

Good luck
David

Topic		Replies	Views
JASMIN suites not completing model run for 2004 JULES	11	367	8 October 2021
JULES running problem: forrtl: severe (408): fort: (3) JULES	20	239	27 June 2024
Segmentation fault (174), JULES on JASMIN JULES	3	398	14 June 2022
JULES segmentation fault after spinup JULES JASMIN , JULES	1	47	13 June 2025
Model fail when Vegetation competition on JULES	1	278	13 June 2022

Help with SIGBUS or Bus error while running multiple JULES models on JASMINE

Related topics