Segmentation fault: Address not mapped to object

I’m struggling to run a nested ensemble suite (u-cj824) which has worked perfectly before without issue. The only thing I’ve changed is one of the atmospheric stability options and added two more stash requests.

Now almost all of my ensemble members are getting a segmentation fault on the regional forecast steps. One of them seems to be working OK.

The majority of the ensemble members’ errors are like this:
_pmiu_daemon(SIGCHLD): [NID 07066] [c8-2c2s6n2] [Wed Sep 7 20:38:53 2022] PE RANK 108 exit signal Segmentation fault
[NID 07066] 2022-09-07 20:38:53 Apid 178344511: initiated application termination
[FAIL] um-atmos # return-code=137
2022-09-07T20:38:59Z CRITICAL - failed/EXIT

And one of them has this:
[74] exceptions: An exception was raised:11 (Segmentation fault)
[74] exceptions: the exception reports the extra information: Address not mapped to object.
[74] exceptions: whilst in a serial region
[74] exceptions: Task had pid=3220 on host nid06532
[74] exceptions: Program is “/home/d02/ajohnson/cylc-run/u-cj824/share/fcm_make/build-atmos/bin/um-atmos.exe”
[74] exceptions: calling registered handler @ 0x20022280
Warning in umPrintMgr: umPrintExceptionHandler : Handler Invoked
[74] exceptions: Done callbacks
[74] exceptions: *** GLIBC ***
[74] exceptions: Data address (si_addr): 0x7ffffffff000; rip: 0x2046f6c8
_pmiu_daemon(SIGCHLD): [NID 06532] [c6-2c0s1n0] [Wed Sep 7 19:07:05 2022] PE RANK 74 exit signal Segmentation fault
[NID 06532] 2022-09-07 19:07:06 Apid 178339545: initiated application termination
[FAIL] um-atmos # return-code=137
2022-09-07T19:07:09Z CRITICAL - failed/EXIT

Do you know what this could be?

Amethyst

What stash did you add?

Grenville

hi Grenville, I actually added three: surface total moisture flux (03223), meridional momentum flux (30316) and BL total moisture flux (03222)

edited to add: some of them seem to be succeeding (and others don’t) when I re-trigger the job. Unsure what the root cause is.

Hi Amethyst

The core file says

#8  stlevels$stlevels_mod_ () at /home/d02/ajohnson/cylc-run/u-cj824/share/fcm_make/preprocess-atmos/src/um/src/control/stash/stlevels.F90:146
#9  0x000000002045b1c9 in stwork$stwork_mod_ () at /home/d02/ajohnson/cylc-run/u-cj824/share/fcm_make/preprocess-atmos/src/um/src/control/stash/stwork.F90:661

so it’s failing in stash - a quick look at the stashmaster file says 30316 should be on pressure levels – so that seems a likely cause - create a pressure level domain profile & try that.

It also says that 03222 should be on rho levels - might be worth fixing too.

Genville

Ah seems to be working now, thanks for catching that!