JASMIN $SBATCH settings mismatch between roses/rose-suite.conf and job file (in rundir)

I get the following error for my test run:

ERROR: sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified

I think I have found out what is causing this error but can’t work out why it is happening.

I am running a copy of suite u-ds632 (belonging to eemvpg) suite on JASMIN (my suite is u-dx182).

When I compare the two suites we have the same rose-suite.conf settings within the roses directory. These are:

JASMIN_MPI_NUM_TASKS=30
JASMIN_RUN_QUEUE=‘par-multi’
JASMIN_WALLTIME_RUN=‘15:00:00’

The settings are available to compare in /home/users/eempvg/roses/u-ds632/rose-suite.conf and /home/users/earagr/roses/u-dx182/rose-suite.conf.

However, in the test run for my run (u-dx182) the job file within the run directory inherits different settings for these variables. (head -100 /home/users/earagr/cylc-run/u-dx182/run1/log/job/20010101T0000Z/spinup_01/NN/job). in this file they are set to:

#SBATCH --qos=high
#SBATCH --partition=standard
#SBATCH --time=15:00:00
#SBATCH --ntasks=30
**#SBATCH --mem=40000
#SBATCH --nodes=1
**
This does not happen for the same test run in the u-ds632 run directory (head -100 /home/users/eempvg/cylc-run/u-ds632/run1/log/job/20010101T0000Z/spinup_01/03/job):
JASMIN_MPI_NUM_TASKS=30
JASMIN_RUN_QUEUE=‘par-multi’
JASMIN_WALLTIME_RUN=‘15:00:00’

My run seems to inheriting these settings from the /home/users/earagr/roses/u-dx182/location/JASMIN/suite.rc:

\[\[jules\]\]

    inherit = None, JASMIN_LOTUS

    \[\[\[job\]\]\]

        batch system = slurm

    \[\[\[directives\]\]\]

        **--account = jules**

        **--qos = high**

        **--partition = standard**

        **#--partition = {{ JASMIN_RUN_QUEUE }}**

        **--time = {{ JASMIN_WALLTIME_RUN }}**

        **--ntasks = {{ JASMIN_MPI_NUM_TASKS }}**

        **--mem = 40000**

        **--nodes = 1**

    \[\[\[remote\]\]\]

        retrieve job logs max size = 10M

    \[\[\[environment\]\]\]

        ROSE_LAUNCHER = mpirun

        MPI_NUM_TASKS = {{ JASMIN_MPI_NUM_TASKS }}

        ANCIL_BASE_PWD = {{ JASMIN_ANCIL_PATH }}/{{ MIPID|lower }}/jules_ancils/

        DRIVE_BASE_PWD = {{ JASMIN_DRIVE_PATH }}/{{ MIPID|lower }}

        OUTPUT_BASE = {{ JASMIN_OUTPUT_BASE }}/$ROSE_SUITE_NAME

{% if L_SPINUP_GENERIC %}

{% if INITIALSE_FROM_NON_DUMP_FILE_SPINUP_GENERIC %}

        INITIAL_NON_DUMP_FILE = {{ JASMIN_INITIAL_NON_DUMP_FILE }}

{% else %}

        INITIAL_DUMP_FOR_SPINUP_GENERIC = {{ JASMIN_INITIAL_DUMP_FOR_SPINUP_GENERIC }}

{% endif %}

{% endif %}

{% if L_SPINUP_2NDSPIN and not L_SPINUP_GENERIC %}

    {% if START_WITH_2NDSPIN %}

        INITIAL_DUMP_FOR_SPINUP_2NDSPIN = {{ JASMIN_INITIAL_DUMP_FOR_SPINUP_2NDSPIN }}

    {% endif %}

{% endif %}

I am not sure why this is happening for me (u-dx182) and not for the original suite (u-ds632). Or how I might fix this issue. Any help or suggestions you can provide would be really helpful.

Thanks,
Ailish

Hi Ailish

Sorry for the delay in replying.

The reason for the differences between /home/users/eempvg/cylc-run/u-ds632/run1/log/job/20010101T0000Z/spinup_01/03/job and /home/users/earagr/cylc-run/u-dx182/run1/log/job/20010101T0000Z/spinup_01/NN/job is that the former was run on 13/10/2025 and the suite u-ds632 has been updated since then. Your suite u-dx182 appears to be based on the latest version of u-ds632. The older version of u-ds632 would no longer work because the partition par-multi no longer exists.

The reason for your error message is that your job script specifies --account=jules when you are not a member of the jules Slurm account. You need to either change jules to a Slurm account of which you are a member (see https://help.jasmin.ac.uk/docs/batch-computing/slurm-queues/#new-slurm-job-accounting-hierarchy for how to get a list) or join the jules group workspace via the JASMIN accounts portal.

Good luck
David

Hi David,

Thanks for your help with this. I requested access to the JULES group and the job now runs the spin_up, which is great. However I am getting a different error that seems to be related to some oddities in the owner of my scratch dir folders.

I get the following error message in my job.err file (tail -100 /home/users/earagr/cylc-run/u-dx182/run3/log/job/20010101T0000Z/spinup_01/NN/job.err):
mkdir: cannot create directory ‘/work/scratch-pw5/earagr//u-dx182/run3’: Permission denied’

When I looked into this, eempvg (who is the owner of the suite I copied) is the owner of the u-dx182 directory on my scratch space.

#check who owns dirs

[earagr@cylc2 earagr]$ find /work/scratch-pw5/earagr -type d -user earagr
/work/scratch-pw5/earagr

[earagr@cylc2 earagr]$ find /work/scratch-pw5/earagr -type d -user eempvg

/work/scratch-pw5/earagr/u-dx182
/work/scratch-pw5/earagr/u-dx182/20CRv3-ERA5

#Test creating files
#can create files in earagr but not in u-dx182

[earagr@cylc2 earagr]$ pwd
/work/scratch-pw5/earagr

[earagr@cylc2 earagr]$ mkdir test

[earagr@cylc2 earagr]$ ls

isimip3a_fire_20CRv3-ERA5_obsclim.dump.20010101.0.nc isimip3a_fire_20CRv3-W5E5_obsclim.dump.20010101.0.nc model_scenario_info_isimip3a.dat u-dx182
isimip3a_fire_20CRv3_obsclim.dump.20010101.0.nc isimip3a_fire_GSWP3-W5E5_obsclim.dump.20010101.0.nc test

[earagr@cylc2 earagr]$ cd u-dx182/

[earagr@cylc2 u-dx182]$ mkdir test
mkdir: cannot create directory ‘test’: Permission denied

I am not sure how to fix this issue. When I search for eempvg in my roses dir using:

grep -ir eempvg .

There are no files defining eempvg as the owner of the cylc suite in cylc-run dir.

Any help or advice you can give would be great :slight_smile:

Thanks again for your help.

Ailish

Hi Ailish

Something strange is going on with file ownership and permissions under your scratch directory (/work/scratch-pw5/earagr). I was going to type more on this subject, but I see that you started a run7 earlier today. Did that work?

David

Hi David,

Thanks for your help :slight_smile: I got Maria (eempvg) to update the permissions on the u-dx182 folder which has fixed the issue for now.

I am now facing some new issues where the dump file I am trying to use to start my run from is not being copied to the correct locations during the run. It leads to the following error:

{MPI Task 0} [FATAL ERROR] file_ncdf_open: Error opening file /work/scratch-pw5/earagr//u-dx182/run10/20CRv3-ERA5/isimip3a_fire_20CRv3-ERA5_obsclim.dump.20010101.0.nc (NetCDF error - No such file or directory)

Within my suite.rc file I give a different directory for the dump file so I am not sure why the model looks in the rundir.

Suite.rc file

./suite.rc: OUTPUT_FOLDER = ${OUTPUT_BASE}/${MODEL}
./suite.rc: INITFILE = ${OUTPUT_FOLDER}/${OUTPUT_NAME_PREVIOUS_SPIN}.dump.${SPIN_END}.0.nc
./suite.rc: OUTPUT_FOLDER = ${OUTPUT_BASE}/${MODEL}
./suite.rc: INITFILE = ${OUTPUT_FOLDER}/{{ MIPID }}_{{ CONFIGNAME }}_${INIT_NAME}${SPINDUMP}.dump.${DUMPTIME}.0.nc
./suite.rc: OUTPUT_FOLDER = ${OUTPUT_BASE}/${MODEL}
./suite.rc: INITFILE = ${OUTPUT_FOLDER}/{{ MIPID }}_{{ CONFIGNAME }}_${INIT_NAME}.dump.${DUMPTIME}.0.nc
./location/JASMIN/suite.rc:
OUTPUT_BASE = {{ JASMIN_OUTPUT_BASE }}/$ROSE_SUITE_NAME
JASMIN_OUTPUT_BASE=‘/work/scratch-pw5/$USER/’
MODEL = {{ model_info[model][“model”] }}
“20CRv3-ERA5”:{“model”: “20CRv3-ERA5”}
Therefore: OUTPUT_FOLDER=/work/scratch-pw5/$USER/u-dx182/20CRv3-ERA5

I am struggling to find any code (in my roses dir) where there is a command to copy the file over to the run directory.

Is this behaviour expected?

For now, I have just manually copied the file to the rundir but it would be good to have a more long-term solution for the issue.

Thanks for all your help and patience,

Ailish

Hi Ailish

I’m fairly sure (although I haven’t been able to find explicit confirmation in the documentation) that the environment variable ROSE_SUITE_NAME includes the run directory. Thus in your case it would be u-dx182/run10. This explains why the task 20CRv3-ERA5_obsclim is looking where it is for its initial dump file. It’s also consistent with where the preceding six spinup tasks have put their output.

The suite appears to be behaving as configured, although it might not be configured as you want it to be. It’s easier to see what it’s doing from the processed version of suite.rc in /home/users/earagr/cylc-run/u-dx182/run10/log/config/flow-processed.cylc. It’s running six spinup tasks (spinup_01, spinup_02, spinup_03, 20CRv3-ERA5_sp2_01, 20CRv3-ERA5_sp2_02, 20CRv3-ERA5_sp2_03) with the output of each forming the input to the next. It’s then ignoring all this hard work and trying to run 20CRv3-ERA5_obsclim from the dump file that it can’t find.

The author of the original suite will be better able to give you advice on how to configure it than I can. I think I can see how to run 20CRv3-ERA5_obsclim from the output of the final spinup task or alternatively how to turn off the spinup tasks, but without digging deeper into suite.rc I can’t see how to run 20CRv3-ERA5_obsclim from a dump file from a previous run of the same task, which I believe is what you want.

I’m sorry that I can’t provide more useful advice, but as I say, the author of the suite will be better able to do that than I can.

David