SOLVED: Segmentation fault in triffid

EDIT 2: This is solved. The issue was with fixes to snow in the temp_fixes namelist. In case someone is having a similar issue, here is the namelist that fixed my issue.

[namelist:jules_temp_fixes]
ctile_orog_fix=2
l_accurate_rho=.true.
l_dtcanfix=.true.
l_fix_alb_ice_thick=.true.
l_fix_albsnow_ts=.true.
!!l_fix_drydep_so2_water=.false.
!!l_fix_improve_drydep=.false.
l_fix_lake_ice_temperatures=.false.
l_fix_moruses_roof_rad_coupling=.false.
l_fix_neg_snow=.false.
l_fix_osa_chloro=.false.
l_fix_snow_frac=.true.
!!l_fix_ukca_h2dd_x=.false.
l_fix_ustar_dust=.true.
l_fix_wind_snow=.true.


Hi there,

I have been getting some strange segmentation faults in the PFT competition in triffid. I have tried following the advice in other seg fault threads - i.e. look at the exact timestep you get the failure. And have summarised my findings below - any help or advice very welcome!

  1. Background
    I am trying to get a JULES vn 7.8 suite running on Jasmin. To do this, I took a suite that I know runs on Jasmin post Cylc2 etc [vn5.4] and used the auto update in roses to convert the namelists. I have then cross-referenced this against the TRENDY setup (u-dp358) and where there were differences in setup I have adopted theirs. My suite is u-dp360.

  2. The fault
    When I run the my suite, I get segmentation fault (174), with the error occuring in the lotka-noeq-subset script. The specific line that faults is “com(l,n,n) = 1.0”, in the subroutine where triffid allocates PFT changes based on differences in PFT height.

  3. Debugging attempts to date

These faults occur at quite random timesteps. I have tried using different versions of the CRUJRA forcing data - that didn’t help. When I ran triffid at a 10-day timestep, the fault occurred in 1995 [run start in 1990]. When I switched this to 1-day, it occurred in 2005 [progress!]. This makes me think it is some instability in the PFT competition. I have tried running it with the old crop pft competition (lotka-noeq) and get the same faults. I have also tried switching INFERNO mortality on & off, and various other switches, but no luck.

I have since run the model writing to dump at each timestep until it faults. What I have found is odd - at the point where the seg fault occurs, all variables in the dumpfile go to infinity. However, at the point directly before the fault, the outputs are completely normal.

For example, here is the PFT output. sum to unity until the fault, infinite after.

It is notable that this occurs at the model point where triffid is called - indicating it is a genuine instability there.

EDIT: in case relevant, I do find that the rho_snow (snowpack) parameter in pixel 3472 on the land only grid becomes NA during the run [i.e. it is not missing at the start but is at the fault].