N512 nodes and decomposition

Hello,

I am getting an ‘out of memory’ error on my suite u-ch360, I have been looking through other tickets, such as-
N512 run - Out of Memory
That has the same issue. They seem to have solved it, but I am still not super confident with changing the nodes and decomposition. Would it be possible to get a quick run down on what and where to change?

Thank you,
Holly

Hi Holly

Go to suite conf → Domain Decomposition → Atmosphere, set East-West Processors to 32 (say) and the same for North-South. Try experimenting to see when it stops OOMing.

Grenville

Thanks Grenvile. I will try some trial and error. Is it possible to keep this ticket open into next week, in case I have a related question (with ARCHER2 down for the rest of this week).

Thank you,
Holly

Hi Grenville,

Thanks for keeping this ticket open.

I have got the model working using the suggestions above, but it is perhaps more computationally expensive than I expected. This may simply just be the case, as I have never run a model at n512 before, but just in case- What is the best way to optimise this? Am I missing a trick?

Thank you,
Holly

Hi Holly

What did you expect and how far off is it?

Grenville

Hi Grenville,

The model uses around 200 CU for 1 month. I was probable expecting no more than 100? Although looking back at your performance notes for ARCHER2, I suppose that isn’t too far out of the range. The purpose of this test was to understand exactly how much I would need for a longer run, and to get the right decomposition. I am starting to gather that there is no ‘one size fits all’ for models, even of the same resolution and version?

Thank you,
Holly

Holly

It depends what you want/need in the cost (in CUs) v time to solution dilemma. A run will always be cheaper the fewer resources (nodes) you throw at it, but the the time to solution may not be acceptable. For an estimate for resources, allow some leeway.

Grenville

Hi Grenville,

Thank you for clarifying. So if I want to use less recourses and don’t mind it running for longer, I need to work out the best way to reduce the nodes.

Thank you,
Holly

Hi,

Another question relating to this post on the n512 model setup, if I wanted to run GA7.1 instead of GA7.0, I get an error (no longer in my files as I have run a test using 7.0), the model fails in install_ancils, I’m guessing that I need to replace a path to a file for 7.1, but I am unsure of what?

Thank you,
Holly