Slow running

I have had a second failure (I did not report first one) from an atmospheric model simulation where I am varying parameters. I run in three month chunks with a time limit of 1 hour 55 mins and do not normally have problems. Both failures occurred after many timesteps so it looks like the model is running considerably slower than normal and so runs out of time. Current case (dfols46p_4/dq00a/work/20110601T0000Z) failed after 47 days (our of 90) so running at less than 1/2 its normal speed. I am running on nvme.

Output from pe0 is in /work/n02/shared/tetts/failures/dfols46p_4/dq00a/work/20110601T0000Z/atmos_main/pe_output/dq00a.fort6.pe0 (I copied it there).

I’ve cleaned out old runs from nvme and will resubmit. Will update this topic once that happens.

Simon

p.s. is there a way of restarting the simulation part way through a cycle? i.e. pick up from where it got to rather than going back up to 3 months (47 days in my case)

Hi Simon,

I have also had some job timeouts this week. Do report these failures to the Archer2 helpdesk, and hopefully this can help them figure out what is going on.

There is an open issue at the moment about the file systems being slow:

My most recent failure does look lime it is I/O related as it finished the model but then timed out writing the data. The other timeouts I have had however look more like what we saw before with the slow nodes, where the model ran slowly the whole time.

Unfortunately I don’t think it is possible to start midway through a cycle.

Annette

Done! And reported the earlier failure too.

Simon

And I had two jobs fail due to node failures. Reported to archer2 helpdesk.

Simon