UM build with CrayPat

Is it possible to specify an extra command for instrumenting a newly-built UM executable?

I would like to use the CrayPat software available on the ARCHER2 machine to profile the UM code
and so would need to run "pat_build um-atmos.exe " in order to create an instrumented file called
“um-atmos.exe+pat”.

Is there some setting within “./u-ch330-EPCC/site/archer2.rc” that could handle this?

Also, how would I ensure that the instrumented executable, “um-atmos.exe+pat”, is run instead of
“um-atmos.exe”? Is this just a case of editing the “rose-app.conf” file.

I’m guessing a bit (i’ve not done this) – in the suite.rc file

 [[fcm_make2_um]]
        inherit = RUN_MAIN, UMBUILD_RESOURCE, UMBUILD
        post-script = pat_build ....                              <==   add this line 
  (not sure how to specify the paths for args to pat_build)

then

in /home/n02/n02/n1280run/cylc-run/u-ch330/share/fcm_make_um/build-atmos/bin, (for example) you’ll see

COMMAND="${@:-${ATMOS_EXEC:-$(dirname "$0")/um-atmos.exe}}",

so you’d need to set ATMOS_EXEC (in the ATMOS task)

:crossed_fingers:

So for the ATMOS task in ‘suite.rc’ I could add something like the following two lines.

ATMOS_EXEC_ORIGINAL = /work/n02/n02/mrbn02/cylc-run/u-ch330-EPCC/share/fcm_make_um/build-atmos/bin/um-atmos.exe
ATMOS_EXEC = $ATMOS_EXEC_ORIGINAL+pat

And then for the fcm_make2_um task I would have “pat_build $ATMOS_EXEC_ORIGINAL”.

Does that look right?
Is parameter expansion possible for variables defined in the same ‘suite.rc’ file?

You can export variables, put multiple commands into the post-script, or even put a wrapper script with everything you need in your suite’s bin directory if this is easier. There are loads of ways to bodge things…

If I ever do this kind of thing, I usually run it once to set everything up, then edit and submit job files myself rather than with ‘cylc restart’ etc - that might help??

Mike

I did this

[[HPC]]
...
  module load perftools-lite
...
  [[fcm_make2_um]]
        inherit = UMBUILD, UMBUILD_RESOURCE
        post-script = """
                      cd $CYLC_SUITE_SHARE_DIR/fcm_make_um/build-atmos/bin
                      pat_build -O apa um-atmos.exe
                      """

but pat_build failed, saying
ERROR: Program '/mnt/lustre/a2fs-work2/work/n02/n02/grenvill/cylc-run/u-cn134-RS-2level/share/fcm_make_um/build-atmos/bin/um-atmos.exe' is already instrumented with PerfTools

in

$CYLC_SUITE_SHARE_DIR/fcm_make_um/build-atmos/bin
-rwxr-xr-x 1 grenvill n02 127529040 Nov  2 08:03 um-atmos.exe+orig
-rwxr-xr-x 1 grenvill n02 128517200 Nov  2 08:03 um-atmos.exe

I don’t know why there’s um-atmos.exe+orig


Now I'm asking you for advice!

Grenville

Running ‘pat_build’ isn’t required when you use perftools-lite - that step is done automatically (hence the appearance of the ‘um-atmos.exe+orig’ file). All you need is the ‘module load perftools-lite’ command.

What I’m trying to do is use the full-fat perftools module which does require ‘pat-build’.
I’ve tried some changes to ‘suite.rc’, which haven’t worked, but looking at your example, I now understand to use “”" delimiters and $CYLC_SUITE_SHARE_DIR.

Mike

This worked for me
in site/archer2.rc

[[HPC]]

module load perftools-base
module load perftools

in suite.rc
[[fcm_make2_um]]
inherit = UMBUILD, UMBUILD_RESOURCE
post-script = “”"
cd $CYLC_SUITE_SHARE_DIR/fcm_make_um/build-atmos/bin
pat_build um-atmos.exe
“”"

grenvill@ln01:~/cylc-run/u-cn134-RS-2level/share/fcm_make_um/build-atmos/bin> ls -lrt
total 250072
-rwxr-xr-x 1 grenvill n02      6607 Nov  2 07:45 um-atmos
-rwxr-xr-x 1 grenvill n02     11207 Nov  2 07:46 um_script_functions
-rwxr-xr-x 1 grenvill n02 127010520 Nov  2 15:43 um-atmos.exe
-rwxr-xr-x 1 grenvill n02 129035728 Nov  2 15:47 um-atmos.exe+pat

I also modified the build config file to include

build-atmos.prop{keep-lib-o} = true

to keep the object files (not sure this needed)

Having an old um-atmos.exe+orig prevented the creation of um-atmos.exe+pat (irritatingly)

Happy to chat about keep-lib-o - you’ll need to create a branch to do that.

Grenville

Thanks for the info.

How do I run with “keep-lib-o = true”?

I can’t get CrayPat to work with the u-ch330-EPCC suite.

fcm_make_um succeeds (does that stage just do the code checkout?), but
fcm_make2_um fails with the following error.

ERROR: Missing required ELF section ‘.note.link’ from the program ‘/mnt/lustre/a2fs-work2/work/n02/n02/mrbn02/cylc-run/u-ch330-EPCC/share/fcm_make_um/build-atmos/bin/um-atmos.exe’. Load the correct ‘perftools’ module and rebuild the program.
2022-11-03T10:51:24Z CRITICAL - failed/EXIT

Looks like this error could be coming from the pat_build command and is caused by the the perftools
module not being loaded during the compile.

Grenville, my module setup is different to your u-cn134-RS-2level suite.
For the HPC task (./site/archer2.rc) I have,

ulimit -s unlimited
module load cpe/21.09
module load um
module swap craype-network-ofi craype-network-ucx
module swap cray-mpich cray-mpich-ucx
module load perftools-base
module load perftools
module list 2>&1
export LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH

That module list appears in the fcm_make2_um “job.out” file.
It doesn’t look like I have perftools (or any modules) loaded for the fcm_make_um task - is that a problem?

fcm co https://code.metoffice.gov.uk/svn/um/main/branches/dev//simonwilson/vn11.6_archer2_compile

edit <your path ...>/vn11.6_archer2_compile/fcm-make/ncas-ex-cce/um-atmos-safe.cfg

to add
build-atmos.prop{keep-lib-o} = true

then point config_root_path to your local branch & rebuild

Grenville

Hi Grenville,

I’ve checked out the code and updated "um-atmos-safe.cfg’ as well as config_root_path in the Rose config.

The problem now is that on running the suite, the fc_make_um task fails with these errors.

FAIL] localhost:/home/mbareford/vn11.6_archer2_compile/fcm-make/ncas-ex-cce/um-atmos-high.cfg: cannot load config file
[FAIL] localhost:/home/mbareford/vn11.6_archer2_compile/fcm-make/ncas-ex-cce/um-atmos-high.cfg: cannot be read

I’m not sure which config file is missing, but looking into ‘um-atmos-high.cfg’ it could be
‘um-atmos-common.cfg’, which in any case exists at ‘/home/mbareford/vn11.6_archer2_compile/fcm-make/inc/um-atmos-common.cfg’

My config_root_path is “localhost:/home/mbareford/vn11.6_archer2_compile”.

Hi Michael, the error says

[FAIL] localhost:/home/mbareford/vn11.6_archer2_compile/fcm-make/ncas-ex-cce/um-atmos-high.cfg: cannot be read
[FAIL] Host key verification failed.^M

so I’m guessing it’s because you have added localhhost (I’ve never added that to the config filename) - try just /home/mbareford/vn11.6_archer2_compile for config_root_path

Thanks Grenville, suite is now running (fcm_make2_um).
Fingers crossed the “um-atmos.exe+pat” will run as expected.

I now see a new error “cannot find -lum-atmos”.

This comes from the fcm_make2_um task and it might originate from the pat_build command:
the executable has been created (“./cylc-run/u-ch330-EPCC/share/fcm_make_um/build-atmos/bin/um-atmos.exe”).

I didn’t know there was a um-atmos library - sure enough, it doesn’t exist anywhere within “/work/n02/n02/mrbn02”.

I can confirm it is from the pat_build command.

I’ve asked for assistance from Cray on this.
It’s surprising that you didn’t encounter this problem and were able to create the um-atmos.exe+pat file.

mea culpa - I forgot you were building with high optimisations, so please add

build-atmos.prop{keep-lib-o} = true

to um-atmos-high.cfg (check that it made it to ARCHER in /home/n02/n02/mrbn02/cylc-run/u-ch330-EPCC/share/fcm_make_um/fcm-make2.cfg)

Thanks Grenville, my instrumented um code (um-atmos.exe+pat) appears to be running.

There was just one extra thing that I needed to do: add the line
“export PAT_RT_MPI_THREAD_REQUIRED=3”
to the HPC pre-script in “./site/archer2/rc”.

Great! - I have no recollection of setting that environment variable (but never ran the instrumented exec.)