Innermost LAM suddenly failing due to walltime limit

Hi CMS,

It’s been a little while since I ran it (since before the upgrade), but the 1 km domain of my suite u-cu484 has started to time out on all the forecasts.

For reference, the forecasts used to run in <60 M requested wall time. The 12 km domain (204x160, running on 8x6 CPUs) and 3 km domain (608x424, running on 20x16 CPUs) both still complete in ~18 and ~37 mins on average.

My 1 km domain is 800x600 and was previously running fine with 32x28 CPUs. I’ve since bumped it up to 36x32 CPUs and increased the walltime to 80M but it’s still failing around a third of the way through the forecast.

I have been seeing this warning in the .err file though, which isn’t present in the 12/3km output:

WARNING: Requested total thread count and/or thread affinity may result in
oversubscription of available CPU resources! Performance may be degraded.
Explicitly set OMP_WAIT_POLICY=PASSIVE or ACTIVE to suppress this message.
Set CRAY_OMP_CHECK_AFFINITY=TRUE to print detailed thread-affinity messages.

Has something changed that means my previous config is no longer efficient? And any advice on how to fix this?

Cheers
Ella

Ella

On the upgraded ARCHER2, the advice is to add --cpus-per-task to the srun command (see Updating a UM suite after the ARCHER2 O/S upgrade) - in site/ncas-cray-ex/suite-adds.rc, try changing the LAUNCHER_PREOPTS (for model runs at least) thus:

ROSE_LAUNCHER_PREOPTS = “–hint=nomultithread --distribution=block:block --cpus-per-task = {{OMP_NUM_THREADS}}”

Grenville

1 Like

Brilliant, thanks Grenville - looks like that worked!

Cheers,
Ella

p.s. as a note to anyone following this thread in future, be careful if you copy and paste this line to make sure the " and – are copied as standard characters (i.e. two hyphens, not an m-dash, and unicode quotation marks!)

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.