I have had a second failure (I did not report first one) from an atmospheric model simulation where I am varying parameters. I run in three month chunks with a time limit of 1 hour 55 mins and do not normally have problems. Both failures occurred after many timesteps so it looks like the model is running considerably slower than normal and so runs out of time. Current case (dfols46p_4/dq00a/work/20110601T0000Z) failed after 47 days (our of 90) so running at less than 1/2 its normal speed. I am running on nvme.
Output from pe0 is in /work/n02/shared/tetts/failures/dfols46p_4/dq00a/work/20110601T0000Z/atmos_main/pe_output/dq00a.fort6.pe0 (I copied it there).
I’ve cleaned out old runs from nvme and will resubmit. Will update this topic once that happens.
Simon
p.s. is there a way of restarting the simulation part way through a cycle? i.e. pick up from where it got to rather than going back up to 3 months (47 days in my case)