Changing directory for UM runs

Is there a way of changing a suite so that it runs somewhere else than /work/n02/n02/tetts? For example, I’d like to run a whole bunch of related suites in /work/n02/n02/tetts/test/job1, /work/n02/n02/tetts/test/job2 etc. I think I need to do two things:

  1. Modify the --chdir directive in archer2.rc in the [[HPC]] block.
  2. Change where the various ancillaries are copied/linked too.

I have tried 1 but that then causes fcm_make_um2 & fcm_make2_pp to fail with ‘submit-failed’. I think this is because they have relative filenames for their stdout and stderr log files. For example:

#SBATCH --output=cylc-run/u-test/log/job/20100901T0000Z/fcm_make2_um/01/job.out
#SBATCH --error=cylc-run/u-test/log/job/20100901T0000Z/fcm_make2_um/01/job.err

As the directories do not exist, slurm submission fails :-(.
So is there a way of changing the cylc-run to /work/n02/n02/tetts/cylc-run? This seems to be created automatically and populated with all its directories etc. Presumable rose expects to look there for log files etc. Or is there a better way?

ta
Simon

Hi Simon,

The short answer is no. Cylc is hard-wired to reference everything via ~ and put cylc files under ~/cylc-run/suite-id.

Regards,
Ros.

Hi Ros,
thanks a lot for your answer.

That would be fine, and I am quite happy for cylc to put its files where ever it wants. And many of the cylc related files are appearing in /work/n02/n02/tetts/cylc-run/suite_name - presumably because $HOME gets redefined. But the UM suite is defining the output log files as cylc-run/suite_name etc [relative file names]. So in the unedited suite they appear relative to /work/n02/n02/tetts (for me) which works and means all cylc stuff is where it is wanted. But if I chdir somewhere else slurm falls over as the directories have not been constructed.

So, how are the filenames for the output logs constructed in fcm_make_um2 & fcm_make2_pp?
If I understand that, I can change em!

ta
Simon

Hi Simon,

Yes $HOME does get redefined because of the ARCHER2 configuration where the compute nodes cannot see /home disk.

Not sure I’m fully understanding what you are trying to do.
The --chdir in the HPC section must to be set as follows:

--chdir=/work/n02/n02/tetts

Cheers,
Ros.

Hi Ros,
I am trying to modify it to -chdir=/work/n02/n02/{{ARCHER2_USERNAME}}/test/utest with the idea that the model runs in that directory putting all its output etc there…
Simon

Hi Simon,

The --output and --error log paths are constructed by cylc.

You can’t use the --chdir to influence where model data ends up.

If you want the model output data (pp files, dumps, etc) saved to a location outside of the ~/cylc-run/suite-id directory structure you will need to move it as part of the workflow. E.g. postproc can be used to “stage/save” the model output data to somewhere else on ARCHER2.

Regards,
Ros.

Hi Ros,
OK – that is probably a better way to proceed. I assume I need my own postproc script or can I simply modify the existing one? Either way, how do I do that? Is there a general housekeeping script I can run at the end – that can clean up the ‘crap’ in cylc-run/suite-id so I don’t run out of disk space.

Hi Simon,

It’s all built into the existing postproc app. Assuming you will be running u-db898.

In panel “fcm_make_pp → configuration → pp_sources”:

  • Change the revision number of the postproc_2.3_archer2 branch from 3910 to 4988
  • Change the revision number of the postproc_2.3_pptransfer_gridftp_nopw branch from 4422 to 5411 to pick up bug fixes, etc

In the Postproc app:

  • In panel “post processing - common settings” switch on archive_toplevel and select “Archer” as the archive_command.
  • In panel “Archer Archiving” set archive_root_path to be the location where you want the files “archived” to.

Depending on how much data your suite generates you may wish to configure the pptransfer task to automatically copy data off-site e.g. to JASMIN.

The housekeeping task will automatically tidy up old cylc log, and work directories as it goes.

Cheers,
Ros.

Hi Ros,
That looks like what I need. Is there some documentation on postproc_2.3_archer2 and housekeeping? I am probably going to start afresh with a slightly different and up-to-date AMIP configuration and make it produce netCDF output.
Simon

Hi Simon,

When you’ve decided on which suite you are going to use let me know as it may be using a different version of postproc which will require different ARCHER2 branches.

There is limited documentation on the postproc app on MOSRS: https://code.metoffice.gov.uk/trac/moci/wiki/app_postproc

The housekeeping app uses standard Rose functionality (rose_prune — Rose Documentation 2019.01.8 documentation). You can see what directories it is pruning (deleting) and on what frequency in the file: /home/n02/n02/tetts/roses/u-db898/app/housekeeping/rose-app.conf

For example in u-bd898:
prune-work-at=-P9M is saying delete the work directory (ie. ~/cylc-run/u-bd898/work/<cycle-point>) when they are 9months prior to the current cycle point.

Cheers,
Ros.

Hi Ros,
thanks a lot. My disk storage is dominated by share. If I add:
prune{share}=-P0D
would that delete everything in share? I assume that the archiving has already run before this step…
Yes, I appreciate it might not be wise – restarting/continuing the run would be tricky!
If I want restartability then maybe prune{share}=-P9M would be sensible. (But how cylc works out which share/data/History_data is more than 9 months old is hard to see…)
Simon

Hi Simon,

History_data can’t be managed by rose prune as it has no idea what files the UM needs. Postproc will handle deleting of superceded files and when a file is archived it is deleted from the History_data directory. On/off switches for deletion of superseded files can be found e.g. for atmosphere in panel postproc → Atmosphere → archiving. I notice you need to switch archive_switch to True to enable archiving.

You could also put the entire cylc /share and /work directories on to the SCRATCH disk so it doesn’t use your /work quota. To do this add the following 2 lines to the top of the ~/roses/<suiteid>/rose-suite.conf file:

root-dir{share}=ln*=/mnt/lustre/a2fs-nvme/work/n02/n02/$USER
root-dir{work}=ln*=/mnt/lustre/a2fs-nvme/work/n02/n02/$USER

These directories will then appear as symlinks in your /work/n02/n02/tetts/cylc-run/<suite-id> directory.

See Data management and transfer - ARCHER2 User Documentation for information on ARCHER2 scratch filesystem. Note that files not accessed for 28 days will be deleted.

Cheers,
Ros.

Perfect. And then I archive what I need at the end of the run. pp/netcdf files and most recent dump.
And if on scratch then the scratch cleaner will save my day!

Hi Ros,
Is there a way of adding the root-dir{share} stuff from the GUI? Or do I do it after saving the configuration?
Simon

Hi Simon,

Add it direct to the rose-suite.conf file.

Doesn’t matter when you add it, the important thing is to make sure the rose edit GUI isn’t open at the same time otherwise if you hit save in the GUI again it will overwrite any changes you have made to the files directly in the meantime.

Regards,
Ros.