Is there a way of changing a suite so that it runs somewhere else than /work/n02/n02/tetts? For example, I’d like to run a whole bunch of related suites in /work/n02/n02/tetts/test/job1, /work/n02/n02/tetts/test/job2 etc. I think I need to do two things:
- Modify the --chdir directive in archer2.rc in the [[HPC]] block.
- Change where the various ancillaries are copied/linked too.
I have tried 1 but that then causes fcm_make_um2 & fcm_make2_pp to fail with ‘submit-failed’. I think this is because they have relative filenames for their stdout and stderr log files. For example:
#SBATCH --output=cylc-run/u-test/log/job/20100901T0000Z/fcm_make2_um/01/job.out
#SBATCH --error=cylc-run/u-test/log/job/20100901T0000Z/fcm_make2_um/01/job.err
As the directories do not exist, slurm submission fails :-(.
So is there a way of changing the cylc-run to /work/n02/n02/tetts/cylc-run? This seems to be created automatically and populated with all its directories etc. Presumable rose expects to look there for log files etc. Or is there a better way?
ta
Simon
Hi Simon,
The short answer is no. Cylc is hard-wired to reference everything via ~
and put cylc files under ~/cylc-run/suite-id
.
Regards,
Ros.
Hi Ros,
thanks a lot for your answer.
That would be fine, and I am quite happy for cylc to put its files where ever it wants. And many of the cylc related files are appearing in /work/n02/n02/tetts/cylc-run/suite_name - presumably because $HOME gets redefined. But the UM suite is defining the output log files as cylc-run/suite_name etc [relative file names]. So in the unedited suite they appear relative to /work/n02/n02/tetts (for me) which works and means all cylc stuff is where it is wanted. But if I chdir somewhere else slurm falls over as the directories have not been constructed.
So, how are the filenames for the output logs constructed in fcm_make_um2 & fcm_make2_pp?
If I understand that, I can change em!
ta
Simon
Hi Simon,
Yes $HOME
does get redefined because of the ARCHER2 configuration where the compute nodes cannot see /home
disk.
Not sure I’m fully understanding what you are trying to do.
The --chdir
in the HPC section must to be set as follows:
--chdir=/work/n02/n02/tetts
Cheers,
Ros.
Hi Ros,
I am trying to modify it to -chdir=/work/n02/n02/{{ARCHER2_USERNAME}}/test/utest with the idea that the model runs in that directory putting all its output etc there…
Simon
Hi Simon,
The --output
and --error
log paths are constructed by cylc.
You can’t use the --chdir
to influence where model data ends up.
If you want the model output data (pp files, dumps, etc) saved to a location outside of the ~/cylc-run/suite-id
directory structure you will need to move it as part of the workflow. E.g. postproc can be used to “stage/save” the model output data to somewhere else on ARCHER2.
Regards,
Ros.
Hi Ros,
OK – that is probably a better way to proceed. I assume I need my own postproc script or can I simply modify the existing one? Either way, how do I do that? Is there a general housekeeping script I can run at the end – that can clean up the ‘crap’ in cylc-run/suite-id so I don’t run out of disk space.
Hi Simon,
It’s all built into the existing postproc app. Assuming you will be running u-db898.
In panel “fcm_make_pp → configuration → pp_sources”:
- Change the revision number of the
postproc_2.3_archer2
branch from 3910 to 4988
- Change the revision number of the
postproc_2.3_pptransfer_gridftp_nopw
branch from 4422 to 5411 to pick up bug fixes, etc
In the Postproc app:
- In panel “post processing - common settings” switch on
archive_toplevel
and select “Archer” as the archive_command
.
- In panel “Archer Archiving” set
archive_root_path
to be the location where you want the files “archived” to.
Depending on how much data your suite generates you may wish to configure the pptransfer
task to automatically copy data off-site e.g. to JASMIN.
The housekeeping task will automatically tidy up old cylc log, and work directories as it goes.
Cheers,
Ros.
Hi Ros,
That looks like what I need. Is there some documentation on postproc_2.3_archer2 and housekeeping? I am probably going to start afresh with a slightly different and up-to-date AMIP configuration and make it produce netCDF output.
Simon
Hi Simon,
When you’ve decided on which suite you are going to use let me know as it may be using a different version of postproc which will require different ARCHER2 branches.
There is limited documentation on the postproc app on MOSRS: https://code.metoffice.gov.uk/trac/moci/wiki/app_postproc
The housekeeping app uses standard Rose functionality (rose_prune — Rose Documentation 2019.01.8 documentation). You can see what directories it is pruning (deleting) and on what frequency in the file: /home/n02/n02/tetts/roses/u-db898/app/housekeeping/rose-app.conf
For example in u-bd898:
prune-work-at=-P9M
is saying delete the work directory (ie. ~/cylc-run/u-bd898/work/<cycle-point>
) when they are 9months prior to the current cycle point.
Cheers,
Ros.
Hi Ros,
thanks a lot. My disk storage is dominated by share. If I add:
prune{share}=-P0D
would that delete everything in share? I assume that the archiving has already run before this step…
Yes, I appreciate it might not be wise – restarting/continuing the run would be tricky!
If I want restartability then maybe prune{share}=-P9M would be sensible. (But how cylc works out which share/data/History_data is more than 9 months old is hard to see…)
Simon
Hi Simon,
History_data
can’t be managed by rose prune
as it has no idea what files the UM needs. Postproc will handle deleting of superceded files and when a file is archived it is deleted from the History_data
directory. On/off switches for deletion of superseded files can be found e.g. for atmosphere in panel postproc → Atmosphere → archiving. I notice you need to switch archive_switch
to True
to enable archiving.
You could also put the entire cylc /share
and /work
directories on to the SCRATCH disk so it doesn’t use your /work
quota. To do this add the following 2 lines to the top of the ~/roses/<suiteid>/rose-suite.conf
file:
root-dir{share}=ln*=/mnt/lustre/a2fs-nvme/work/n02/n02/$USER
root-dir{work}=ln*=/mnt/lustre/a2fs-nvme/work/n02/n02/$USER
These directories will then appear as symlinks in your /work/n02/n02/tetts/cylc-run/<suite-id>
directory.
See Data management and transfer - ARCHER2 User Documentation for information on ARCHER2 scratch filesystem. Note that files not accessed for 28 days will be deleted.
Cheers,
Ros.
Perfect. And then I archive what I need at the end of the run. pp/netcdf files and most recent dump.
And if on scratch then the scratch cleaner will save my day!
Hi Ros,
Is there a way of adding the root-dir{share} stuff from the GUI? Or do I do it after saving the configuration?
Simon
Hi Simon,
Add it direct to the rose-suite.conf
file.
Doesn’t matter when you add it, the important thing is to make sure the rose edit GUI isn’t open at the same time otherwise if you hit save in the GUI again it will overwrite any changes you have made to the files directly in the meantime.
Regards,
Ros.