I am trying to update a branch that I have been working on from UM vn13.4 to UM vn13.8. I’m running in suite u-do774 and I’ve got it compiling, but am now getting the following segmentation fault during atmos_main job:
[258] exceptions: An exception was raised:11 (Segmentation fault)
[258] exceptions: the exception reports the extra information: Address not mapped to object.
[258] exceptions: whilst in a serial region
[258] exceptions: Task had pid=35283 on host nid01246
[258] exceptions: Program is “/home/d03/isangha/cylc-run/u-do774/share/fcm_make_um/build-atmos/bin/um-atmos.exe”
[258] exceptions: calling registered handler @ 0x00423460
_pmiu_daemon(SIGCHLD): [NID 01246] [c6-0c1s7n2] [Thu Apr 3 15:19:05 2025] PE RANK 258 exit signal Segmentation fault
[NID 01246] 2025-04-03 15:19:05 Apid 228806418: initiated application termination
[FAIL] um-atmos # return-code=137
2025-04-03T15:19:10Z CRITICAL - failed/EXIT
Any advice on how to go about debugging this would be much appreciated! Thanks 
Hi Isabella,
There are multiple changes (to suite and branches) being made at the same time, so it would be better to check that the base configuration works. If you can run the suite without your branches that will be a good start.
After that if it is possible to add your branches but not use the changes that would be ideal. However, I see that are no new switches (namelist items) being added so it might be that your changes are ‘active’ by default. Even if you cannot turn off the new code, maybe deactivating the new diagnostics will help determine if the diag processing has any issues.
I see that you have PrintStatus set to Diag, so that will be helpful, but the log indicates error happening on PE 258, so to get output from that (and all other PEs) you will need to change in app/um/rose-app.conf → [namelist:prnt_contrl] → prnt_writers=1.
Hopefully these changes will give some indication. about the cause.
–
Mohit
It stopped in ukca_main1-ukca_amin1.F90 trying to deallocate (if that’s any help)
#20 0x0000000002db1082 in __DEALLOCATE ()
#21 0x0000000000eca720 in ukca_main1$ukca_main1_mod_ () at /home/d03/isangha/cylc-run/u-do774/share/fcm_make_um/preprocess-atmos/src/ukca/src/control/core/top_level/ukca_main1-ukca_main1.F90:2807
#22 0x0000000000e96919 in ukca_step_3d_domain$ukca_step_mod_ () at /home/d03/isangha/cylc-run/u-do774/share/fcm_make_um/preprocess-atmos/src/ukca/src/control/core/top_level/ukca_step_mod.F90:378
#23 0x0000000001abdffd in atmos_ukca$atmos_ukca_mod_ () at /home/d03/isangha/cylc-run/u-do774/share/fcm_make_um/preprocess-atmos/src/um/src/control/ukca_interface/atmos_ukca_mod.F90:1004
#24 0x0000000000e3a925 in atm_step_4a$atm_step_4a_mod_ () at /home/d03/isangha/cylc-run/u-do774/share/fcm_make_um/preprocess-atmos/src/um/src/control/top_level/atm_step_4A.F90:4702
#25 0x0000000000494a6b in u_model_4a$u_model_4a_mod_ () at /home/d03/isangha/cylc-run/u-do774/share/fcm_make_um/preprocess-atmos/src/um/src/control/top_level/u_model_4A.F90:331
#26 0x0000000000411b15 in um_shell$um_shell_mod_ () at /home/d03/isangha/cylc-run/u-do774/share/fcm_make_um/preprocess-atmos/src/um/src/control/top_level/um_shell.F90:704
#27 0x00000000004085c8 in um_main_ () at /home/d03/isangha/cylc-run/u-do774/share/fcm_make_um/preprocess-atmos/src/um/src/control/top_level/um_main.F90:20
Thanks Grenville,
The suite seems to have been resubmitted this morning so will wait for the latest output and see if any more information is available.
M
Thanks Grenville and Mohit,
I am trying to figure out why there is the error occurring with the DEALLOCATE in ukca_main1. I have changed the app/um/rose-app.conf → [namelist:prnt_contrl] → prnt_writers to 1. To see the outputs on the PE will it be under the share folder?
Best,
Izzy
The model log files are can be found in work/date-time/atmos_main/pe_output/
The backtrace from latest run seems to be pointing to a DEALLOCATE statement in asad_diags_output_ctl.F90, so in the first instance try to deactivate the STASH requests for any new diagnostics and see if the rest of the changes are working.
I tried running again without any of the new STASH and it still pulls a segmentation fault on multiple PEs. The error has this line:
’ the exception reports the extra information: Address not mapped to object. ’
but I’m not sure how to interpret this.
Thanks for your help!
Yes, the nature of the failure has changed, but there does not seem to enough information available to get the exact location.
It might help to check the base configuration itself by excluding your branches - to see if the upgrade to vn13.8 has any issues.
Remember to use ‘rose suite-run --new’ to remove old build and data, core files, etc.
Mohit