Memory space error at recon stage

Hi, I’ve been trying to run a UKESM1.0 AMIP version of the UM with some changed inputs, working in u-de347. For now, I am just trying to run one month of the model to check it works with my changes. However, I have been getting this error for recon:

lib-4205 : UNRECOVERABLE library error
The program was unable to request more memory space.
tcmalloc: large alloc 1441714830712012800 bytes == (nil)

My jobs have also been stuck in queues for a long time; last time it took six days just to get to the (failed) recon stage; it seems like the jobs are slower with every iteration I try. I don’t think I’ve been overusing the queues (as I am literally just running this one simple job every few days), so I don’t think I would be getting deprioritised for that reason. It’s really frustrating because it’s now taking weeks just to figure out if my model set up works.

Any advice would be appreciated!

It looks like the reconfiguration is trying to read a file that has the wrong endianness. What inputs did you change?

Grenville

Hi Grenville, thanks for your reply.

I am trying to use my own values for surface albedo (stash requests 244 and 245). The original albedo file used by the model, when ancil file is not specified, is a .land file. I haven’t been able to figure out what format that is exactly, but I am inputting a netcdf file with a manually added .land extension. I’m not sure this is the way to go to be honest - I had a discussion with Patrick about it here: Changing albedo climatology in JULES.

The file I am inputting is a copy of one of the albedo files in /work/y07/shared/umshared/ancil/atmos - I believe that is the destination the files are normally taken from when not manually specified - with some changes to the values.

Please allow us read permissions on your /work and /home spaces on ARCHER2

on an ARCHER2 login node

chmod -R g+rX /home/n02/n02/mfleg
chmod -R g+rX /work/n02/n02/mfleg

Thanks Grenville, I’ve done that now

Michaela

Ancillary files are in proprietary UM fields-file format (the .land extension is just a convenience and has no relevance to the file format.)

I think there is an easier way to create your own ancillary file for stash items 244 and 245.

  1. use xconv to convert the currently used albedo ancil file into a netcdf file
  2. use netcdf4 to change the data but preserve all metadata, variable names etc
  3. use xancil to convert the modified netcdf file back to ancillary file format.

I ran a test case, without step 2, starting from (arbitrarily) /work/y07/shared/umshared/ancil/atmos/n96e/general_land/GlobAlbedo/v2/qrclim.land that worked without any problems.

xconv (very intuitive) and xancil (Xancil 0.58 documentation) are both installed on ARCHER2.

Grenville

Hi Grenville,

Thanks for this - this seems to have solved my issue as far as I can tell. I’ve been trying to test it and I am now running into another problem, getting this error:

? Error code: 1
? Error from routine: portio2a:flush_unit_buffer
? Error message: Failed in output_buffer()
? Error from processor: 0
? Error number: 72

It seems to be a memory space issue. Would it be possible to increase my /work quota, please?

increased to 1TB (enough?)

Grenville

Should be enough, thank you!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.