Severe (174): SIGSEGV, segmentation fault occurred

Dear Helpdesk,

I have came through a segmentation error when running the u-al752 suite which works perfectly until I add the wl(l) = MIN(wcarb(l),wlite(l)). So I am guessing that it might be an array boundary issue. I have attached the error log. I have also seen a similar qsat_mod error on CMS, but their solution is not clear and I am not sure if we are having a same problem.

Do you know what this could be?

Best,
Wenyao

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
jules.exe 000000000097F423 Unknown Unknown Unknown
libpthread-2.17.s 00007F814FE64630 Unknown Unknown Unknown
jules.exe 000000000089164C qsat_mod_mp_qsat_ 109 qsat_mod.F90
jules.exe 00000000008997DE screen_tq_mod_mp_ 401 screen_tq_jls.F90
jules.exe 000000000084716F jules_griddiag_sf 477 jules_griddiag_sf_implicit_jls.F90
jules.exe 0000000000750453 surf_couple_impli 495 surf_couple_implicit_mod.F90
jules.exe 000000000041525E control_ 723 control.F90
jules.exe 000000000040CF7F MAIN__ 195 jules.F90
jules.exe 000000000040CB92 Unknown Unknown Unknown
libc-2.17.so 00007F814F8A5555 __libc_start_main Unknown Unknown
jules.exe 000000000040CAA9 Unknown Unknown Unknown

Hi Wenyao:
Yes, sometimes there are failures in the qsat routine, for various reasons. Usually, it’s because there is something wrong with the setup or the code or the input files. It might not be caused by the array boundary issue that you suspect.

You might try looking at the lines indicated for the various files (i.e., 109 for qsat_mod). These line numbers are for the preprocessed code files that are extracted by fcm_make in your cylc-run directory.

You might also try using JULES_BUILD=DEBUG instead of JULES_BUILD=NORMAL in your suite.
This will slow things down, but the error messages might make a bit more sense.
Patrick

Dear Patrick,

Thanks for the advice. I discovered that 109 for qsat is indeed empty.

. When I switch to debug mode and this pops up.forrtl: error (73): floating divide by zero
Image PC Routine Line Source
jules.exe 0000000001B578F4 Unknown Unknown Unknown
libpthread-2.17.s 00007FEA70B84630 Unknown Unknown Unknown
jules.exe 00000000018E9CC4 leaf_mod_mp_leaf_ 427 leaf_jls_mod.F90
jules.exe 000000000152C761 sf_stom_mod_mp_sf 1392 sf_stom_jls_mod.F90
jules.exe 000000000140E1C4 physiol_mod_mp_ph 1047 physiol_jls_mod.F90
jules.exe 0000000001237353 jules_land_sf_exp 1060 jules_land_sf_explicit_jls.F90
jules.exe 0000000000DA5EB2 surf_couple_expli 501 surf_couple_explicit_mod.F90
jules.exe 00000000004252FF control_ 582 control.F90
jules.exe 000000000040D106 MAIN__ 195 jules.F90
jules.exe 000000000040CB92 Unknown Unknown Unknown
libc-2.17.so 00007FEA705C5555 __libc_start_main Unknown Unknown
jules.exe 000000000040CAA9 Unknown Unknown Unknown
but line 427 in leaf is commented… I wonder if the error messages it gives is the right one.

Best,
Wenyao

Hi Wenyao
The line numbers don’t refer to the line numbers in the code that exists in your directory of your jules branch. Instead, the line numbers refer to the line numbers of the corresponding files in the preprocessed subdirectory of the fcm_make subdirectory of the ~/cylc-run directory for your u-al752 suite.

The code in your jules branch gets preprocessed by fcm_make so the line numbers change.

Furthermore, “turning the debug switch on” exposes division by zero bugs, which are new bugs that are (safely?) ignored when running with the debug switch turned off. You still need to find the cause of the original bug. The original bug might be caused by the division by zero bug, but not necessarily.

Patrick

Dear Patrick,

I discovered that the t_growth_gb is probably miscalculated because they were huge at around 1e8 level. Do you think that might be causing this?

Best,
Wenyao

Hi Wenyao:
If t_growth_gb is miscalculated, I would guess that yes, it is possible that it is causing the segmentation fault. But it is also possible that it is not the cause of the problem. You might try setting t_growth_gb to a constant value, to see if the segmentation fault persists.
Patrick

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.