Ozone redistribution in SSP245 suite

Hi CMS team,

I have managed to get my SSP245 suite working (u-dq189). I am now trying to get the ozone redistribution code working. The suite runs for the first year, and the first month it is trying to do the redistribution it fails.

The suite fails in the “OZONE” stage within the “redistribute_ozone” step. The job.err is

/work/y07/shared/umshared/iris/python: line 8: 242601 Killed singularity run -B $BIND_POINTS --env=LD_LIBRARY_PATH=$LOCAL_LD_LIBRARY_PATH $SING_SIF “$@”
[FAIL] python_env python ${CYLC_SUITE_DEF_PATH}/src/contrib/redistribute_ozone.py -t $TROPOPAUSE_INPUT -r $OROGRAPHY_INPUT -d $DENSITY_INPUT -z $OZONE_INPUT -o $OZONE_OUTPUT -y $YEAR <<‘STDIN
[FAIL]
[FAIL] ‘STDIN’ # return-code=137
2025-06-14T02:08:27Z CRITICAL - failed/EXIT
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=9833851.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.

Within the /home/n02/n02/penmaher/cylc-run/u-dq189/share/data/ozone_redistribution/ dir the files are there as expected:

lrwxrwxrwx 1 penmaher n02 81 Jun 12 14:06 qrparm.orog → /work/y07/shared/umshared/ancil/atmos/n96e/orca1/orography/globe30/v6/qrparm.orog
lrwxrwxrwx 1 penmaher n02 63 Jun 12 14:06 model_data → /work/n02/n02/penmaher/cylc-run/u-dq189/share/data/History_Data
lrwxrwxrwx 1 penmaher n02 114 Jun 12 14:06 mmro3_monthly_CMIP6_2014_N96_dq189-ancil_2anc → /work/y07/shared/umshared/cmip6/ancils/n96e/ssp245/Ozone/v1/historic_interpolated_3d_ozone_n96e_2015_2099_ants.anc
lrwxrwxrwx 1 penmaher n02 114 Jun 13 09:13 mmro3_monthly_CMIP6_2015_N96_dq189-ancil_2anc → /work/y07/shared/umshared/cmip6/ancils/n96e/ssp245/Ozone/v1/historic_interpolated_3d_ozone_n96e_2015_2099_ants.anc
lrwxrwxrwx 1 penmaher n02 99 Jun 14 02:58 dq189a.po2015.pp → /work/n02/n02/penmaher/cylc-run/u-dq189/share/data/ozone_redistribution/model_data/dq189a.po2015.pp

So the input to the python call look okay.

I have tried testing the python_env within the site dir.

[penmaher@puma2 site]$ archer2_python_env python
bash: archer2_python_env: command not found…

I am not sure if this is how to test the env properly. I would be interested to hear what you think might be going on.

Thank you!

Penny

Penny

It’s odd that the job ran out of memory. The default memory allocation in the serial queue has been OK for us. As quick test, please try increasing the memory requested - in [[HPC_SERIAL]]
add
–mem=4G

thus:

 [[HPC_SERIAL]]
        inherit = None, HPC
        [[[directives]]]
           --ntasks=1
           --partition=serial
           --qos=serial
           --mem=4G

then reload the suite and retrigger the task

Grenville

Hi Grenville,
Thanks for helping. I increased the memory as you suggested in site/archer2.rc, reloaded the suite using rose suite-run --reload and triggered the failed step. I still get the same error.
Penny

Penny

Let’s exhaust this approach before trying something else, you can ask for up to 125 GB, I guess it needs to read in /work/y07/shared/umshared/cmip6/ancils/n96e/ssp245/Ozone/v1/historic_interpolated_3d_ozone_n96e_2015_2099_ants.anc which is 20GB, so maybe try setting mem=100GB .

Grenville

Yes that fixed it. Thanks. I plan to break the files down into smaller chunks but for now I am simply testing to see if it works.

As an FYI for me, how did you diagnose this as a memory issue? The error message pointed me towards a python issue.

Much appreciated.
Penny

the error message you posted says: