Sudden "NaNs in error term in BiCGstab" after model running smoothly

Hi CMS Helpdesk,

I’m running a 3-month simulation of u-dd727 which is a UKESM1.1 piClim-control (AMIP) suite with TRIFFID enabled and I’ve run into the infamous “NaNs in error term in BiCGstab” failure after 1 iteration in the atmos_main task. The error appears to occur on every processor and I can’t identify where the NaN terms are coming from, even after enabling extra diagnostic messages in the task output for atmosphere and recon.

The model was running without any errors on shorter 10-day runs. I made a couple of changes to the model output streams and adjusted the model dump frequency and the run length before running the suite again. I have also tried rebuilding the UM from scratch but this had no impact.

Can you advise where to identify the source of the error and possibly explain why this happened all of a sudden?



After some digging, the error is connected to Fast physics terms from atmos_physics2:

This was from a new run with u-de067, a copy of the above suite but configured from its original source: u-cw506. I’ve had a look through the recon but can’t spot anything unusual.

Can you advise?



I’m pleased to say I’ve corrected the issue which was a corrupted initial dump file. I’ve no earthly idea how it became corrupted but after copying a fresh version of the dump the model ran smoothly with full functionality.

Thank you for your time,

