I’m struggling to run a nested ensemble suite (u-cj824) which has worked perfectly before without issue. The only thing I’ve changed is one of the atmospheric stability options and added two more stash requests.
Now almost all of my ensemble members are getting a segmentation fault on the regional forecast steps. One of them seems to be working OK.
The majority of the ensemble members’ errors are like this:
_pmiu_daemon(SIGCHLD): [NID 07066] [c8-2c2s6n2] [Wed Sep 7 20:38:53 2022] PE RANK 108 exit signal Segmentation fault
[NID 07066] 2022-09-07 20:38:53 Apid 178344511: initiated application termination
[FAIL] um-atmos # return-code=137
2022-09-07T20:38:59Z CRITICAL - failed/EXIT
And one of them has this:
[74] exceptions: An exception was raised:11 (Segmentation fault)
[74] exceptions: the exception reports the extra information: Address not mapped to object.
[74] exceptions: whilst in a serial region
[74] exceptions: Task had pid=3220 on host nid06532
[74] exceptions: Program is “/home/d02/ajohnson/cylc-run/u-cj824/share/fcm_make/build-atmos/bin/um-atmos.exe”
[74] exceptions: calling registered handler @ 0x20022280
Warning in umPrintMgr: umPrintExceptionHandler : Handler Invoked
[74] exceptions: Done callbacks
[74] exceptions: *** GLIBC ***
[74] exceptions: Data address (si_addr): 0x7ffffffff000; rip: 0x2046f6c8
_pmiu_daemon(SIGCHLD): [NID 06532] [c6-2c0s1n0] [Wed Sep 7 19:07:05 2022] PE RANK 74 exit signal Segmentation fault
[NID 06532] 2022-09-07 19:07:06 Apid 178339545: initiated application termination
[FAIL] um-atmos # return-code=137
2022-09-07T19:07:09Z CRITICAL - failed/EXIT
Do you know what this could be?