Hello, I’m trying to run a rather large domain (2000x2000 points) and when the run reaches the forecast stage, I’m encountering this error -
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=10650430.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: nid001161: task 0: Out Of Memory
srun: launch/slurm: _step_signal: Terminating StepId=10650430.0
slurmstepd: error: *** STEP 10650430.0 ON nid001161 CANCELLED AT 2025-08-21T09:34:37 ***
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=10650430.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
[FAIL] um-atmos <<‘STDIN’
[FAIL]
[FAIL] ‘STDIN’ # return-code=1
2025-08-21T08:34:40Z CRITICAL - failed/EXIT
Is there a way to increase the memory available for the run? I’ve looked at some similar tickets posted, but I’m not sure if/where I could change the memory settings.
Best,
Michelle Maclennan