/work quota

Hi Grenville, is there any chance of increasing my /work quota on archer2?

I’m running up against quota issues a lot at the moment, which slows my workflow down a lot.

On a related note, do you think you could help me adapt the housekeeping script to clean un-needed files (e.g. glm outputs, ics, LBCs etc as usual but also ERA5 startfiles) for cycles already completed, download new start data ahead of time and transfer data to jasmin?

At the moment I’m doing this manually and it’s a real chore…

Cheers
Ella

Hi Ella

I boosted your /work quota.

Where/what is the start data - if it’s local, this should be simple, but much less so if not?

Transfer to JASMIN might be relatively straightforward - are you currently running rose-arch?

Have you had a chance to look at rose_prune — Rose Documentation 2.0.0 documentation ?

Grenville

Thanks for upping my quota Grenville, that’s really helpful.

Start data is local (/work/n02/n02/shakka/ERA5/) so I’m hoping this is an easy step.

and re: transfer - glad to hear it might be easy! I’m not running rose-arch at the moment and I started reading the rose-prune docs but got a bit lost about where I should make any adjustments, so any suggestions much appreciated.

Cheers
Ella

Hi Ella

I may have misunderstood the “download start data ahead of time” – if you have a script that runs to get the ERA data (from wherever), then adding a task to grab the data shouldn’t be hard - how do you get the ERA data currently?

Grenville

No that should be fine - I’ve got a python script using cdsapi that I can run from the command line, and another shell script that I use to process the data into the format the UM likes. Is it as simple as adding in lines to the app/housekeeping/rose-app.conf something like this:

> meta=rose_prune
> mode=rose_prune
> 
> [prune]
> archive-logs-at=$CYCLE_OFFSET
> module load cray-python
> python ~/download_startfile.sh --cycle +P1DT0H
> bash ~/concat_files.sh -cycle +P1DT0H
> bash ~/remove_files.sh -cycle -P1DT0H
> prune{share/cycle}=-PT0H:'*/*/*_da*'
>                   =$HK_CYCLE:'*/*/*/*/*_da*'
> prune-remote-logs-at=$CYCLE_OFFSET
> prune-server-logs-at=$CYCLE_OFFSET
> prune{work}=$CYCLE_OFFSET

?

Update: so this doesn’t work because I’m trying to use generic shell commands (e.g. module load) inside a rose-type script. I’m a bit stuck here, as I don’t really know very well how to adapt these kinds of job scripts.

Where can I insert the commands to my custom scripts within the housekeeping job?

Ella

I’d add a new app to do the data download see https://ncas-cms.github.io/um-training/rose-cylc-exercises.html#adding-a-new-app-to-a-suite for how to add a simple app. The documentation goes on to explain how to get the task to run a script, which is what you need if I understand correctly. Are there authentication issues with the download script ?

You’ll then need to tweak the cylc graph to run the task as needed - maybe at the start of each cycle?

Grenville

1 Like

Gotcha - I will investigate now…
Cheers
Ella

Hi @grenville , I’ve had a go running through the link you sent but am getting stuck with adding the new task to the suite definition, possibly because I’m trying to do this in the nesting suite, or maybe because the exercise was written before archer2 was upgraded.

Either way, the line {% set INIT_GRAPH = INIT_GRAPH ~ ’ => atmos_main’ if TASK_RUN else INIT_GRAPH %} isn’t present under [scheduling] [[dependencies]] in suite.rc and I’m finding it hard to figure out where to put the call to my new app in the suite file to make it run before the glm. Can you help?

Cheers,
Ella

Ella, what suiteid?

u-cy223. Will run a commit now. E

Hi Ella

Where should the task (i called it test-task below) run - like this?

   [[dependencies]]
        [[[R1]]]
             graph = """
         install_cold_hpc => test-task => install_glm_startdata
                     """
        [[[PT24H]]]
             graph = """
test-task => install_glm_startdata => glm_um_recon1 => glm_um_fcst_000 => glm_um_fcst_001 => glm_um_fcst_002 => glm_um_fcst_003 => glm_um_fcst_004 => glm_um_fcst_005 => housekeep_cycle

Grenville

Hi Grenville, that looks like perfect placement - we need the data to download before it can be installed, so this flow works. It doesn’t look exactly like this in the suite.rc, so I added it in here:

{% if BUILD_CREATEBC is defined and BUILD_CREATEBC in [“new”] %}
fcm_make_createbc => get_era_data => install_{{DRV_MOD[“name”]}}_startdata
{% endif %}

Doesn’t seem to show up in the graph when I submit it like this - do I need to re-build the executable?

Cheers
Ella

Hi Ella

You can see what the graph actually resolves to in the suite.rc.processed file (in the cylc-run directory) - that’s where I got the output above.

I added test-task in suite.rc for the R1 cycle and in suite-graph/dm.rc for the rest - the nesting suite is a mystery, so I was guessing a bit. (see my copy of u-cy223)

There’s no need to rebuild anything - I’m struggling to see where BUILD_CREATEBC is defined - that would account for not seeing get_era_data

Grenville

Hi Grenville, I added the line to suite-graph/dm.rc as you suggested, and kept the line I posted above in suite.rc (just after createbc is defined at L181). With these changes I get the get_era_data task pop up under HOST_HPC before the glm stages - success! I’m now going to test for a random date and see if this works… will report back. E

ok so, update: it didn’t like the line ‘module load cray-python’ in the job script - but this is necessary for the script to work.

So I added the following section into the site/ncas-cray-ex/suite-adds.rc file:

[[GET_ERA_DATA]]
“”"
module load cray-python
“”"

and removed it from the job script

I thought this was okay so far, but then I got an error on the line python ~/download_startfile.py

…realised I was referencing a script in my /home area, so I moved them both to /work and updated the reference in the job script.

I now realise that the script is not pulling in the $CYCLE_OFFSET variable properly - how do I get it to use the already-defined var in this app?

Ella

Hi Ella

I think it’d be better to run this on the serial nodes (then you won’t need to bother with srun etc)

add

   [[HPC_SERIAL]]
        inherit = None, HOST_HPC
        [[[environment]]]
            ROSE_TASK_N_JOBS = 1
        [[[job]]]
            execution time limit = PT30M
        [[[directives]]]
            --partition=serial
            --qos=serial
            --ntasks=1
            --mem=4G

add the pre-script to load python, change the inherit:

 [[get_era_data]]
        inherit = HPC_SERIAL
         pre-script = "module load cray-python" 
        [[[environment]]]
             ROSE_TASK_APP = get_era_data
        [[[job]]]
            execution time limit = PT30M

in rose-app.conf

[command]
default=main.sh

(remove what’s there)

See how modify_netcdf_metadata was handled in u-cn134

I hope this helps.

Grenville

Where am I adding the first bit?

in suite.rc would be OK – possibly better from an organisational viewpoint in site/ncas-cray-ex/suite-adds.rc

1 Like

Ok, I tried this but it doesn’t seem to load the module I need on archer - the .err says

Traceback (most recent call last):
File “/work/n02/n02/shakka/download_startfile.py”, line 2, in
import cdsapi
ModuleNotFoundError: No module named ‘cdsapi’

which is the error you get when the module isn’t loaded.

It also seems from the .out file that the $CYCLE_OFFSET variable isn’t being loaded correctly, which I’m investigating now…