Transfer CMIP6 simulations (PI run) with HadGEM3 on MetO Cray to Monsoon2

Dear CMS helpdesk

I am quite new to Monsoon2 and I want to run the suite u-ar766/trunk (HadGEM3-GC3.1-LL PI control for CMIP6) on it, so I copied it to u-ck646. But I met several problems:

  1. In the suite conf/Machine Options, there are only two options ‘MetO Cray’ and ‘Archer’. So I followed the ticket (http://cms.ncas.ac.uk/ticket/3256) to make the suite run on Monsoon2. But there is no original files ‘monsoon.rc’ and ‘monsoon_restart.rc’, so I ‘steal’ it from ‘/home/d05/mvguar/roses/u-bt694/site’, whose author raised the ticket. May I ask, do you have a better way to transfer the suite to run on Monsoon2?

  2. As I use the suite for research purposes, it is required to use the file " remove_cmip6_metadata.py" to remove the metadata. However, I cannot find an equivalent of “ROSE_PYTHONPATH = ‘/home/h03/fcm/rose/lib/python’” on Monsoon2 to run this script. Therefore, the task “valid_suite_info” failed and I have to manually reset it to “succeeded” every time. Is there another better way?

  3. I modified the required initial conditions in “/home/d05/qgao/roses/u-ck646/rose-suite.conf”, e.g. CICE_INIT, NEMO_ICEBERGS_START, NEMO_START, REBUILD_NEMO_SCRIPT based on the above-mentioned suite “/home/d05/mvguar/roses/u-bt694”. But the task “recon” still failed with output here (’/home/d05/qgao/cylc-run/u-ck646/log/job/18500101T0000Z/recon/01/job.err’). The main problem is “Error message: Failed to open file /data/d01/ukcmip6/Restarts/u-aq853/aq853a.da25940101_00”. I cannot find where the suite requires this file, can you please advise?

Thanks for your time in advance. If you might need any additional information, please let me know.

Best regards, Qinggang

Hi Qinggang,

  1. Depends on the suite sometimes the meto_cray files with work with a couple of tweaks. Adding a monsoon.rc file is the recommended way to make it easy to switch between the 2 platforms.

  2. Use home/d04/fcm/rose/lib/python

  3. Did you solve this? I can’t see any run output for u-ck646.
    The directory /home/d05/qgao/cylc-run/u-ck646/log/job/18500101T0000Z/recon/01/job.err no longer exists.

    /data/d01/ukcmip6/Restarts/u-aq853/aq853a.da25940101_00 is the start dump - search for ainitial in the GUI.

Regards,
Ros.

Hi Ros

Thanks a lot. I solved the problem about “ainitial” through changing it in roses/u-ck646/app/um/rose-app.conf as I cannot modify it in the Rose GUI. But I still met a problem when running the model.

I copied the error message cylc-run/u-ck646/log/job/18500101T0000Z/coupled/01/job.err to ‘/home/d05/qgao/job.err’, which gives the error message as below. I am not sure where the problem is. For UM/namelist/Model Input and Output/STASH Requests and Profiles/STASH Requests, I switched on only the package ‘CMIP6-core’ and everything with use_name UPCOUP as indicated here (http://cms.ncas.ac.uk/ticket/3538). But even after I switched on " CMIP6-N96HGM3" as required here: (https://code.metoffice.gov.uk/trac/ukcmip6/wiki/runs/u-ar766), it still gives the same error message.

Can you please advise?

Best regards, Qinggang

???
??? WARNING ???
? Warning code: -30
? Warning from routine: PRELIM
? Warning message:
? Field - Section:5, Item:244 request denied.
? Unavailable to this model version.
? Warning from processor: 0
? Warning number: 60
???

???
???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
? Error code: 1
? Error from routine: ST_DIAG3
? Error message: Heat Flux not on TEM pressure levels - requested pressure levels do not match, diagnostic aborted
? Error from processor: 0
? Error number: 61
???

[0] exceptions: An non-exception application exit occured.
[0] exceptions: whilst in a serial region
[0] exceptions: Task had pid=47251 on host nid01154
[0] exceptions: Program is “./atmos.exe”

Hi Qinggang,

u-ar766 was ported to Monsoon officially as u-ar930 so I would suggest you start with that. Make sure it runs first before you make your configuration changes.

Regards,
Ros.

Hi Ros

Thanks a lot. u-ar930 works well with some modifications.

Besides, I plan to modify the historical simulation (u-bg466) and ssp585 future scenario simulation (u-bi805) with HadGEM3-GC3.1-LL in CMIP6. However, these two suites are both designed for MetO Cray and Archer, are they officially ported to Monsoon as well? If so, can you let me know the corresponding suites? And where can I find such information about whether they are ported to Monsoon or not?

Best regards, Qinggang

Hi Qinggang,

u-bg466 and u-bi805 don’t have official Monsoon ports but since Monsoon is a portion of the Met Office XCS it’s easy to change without having to create new *.rc files.

For u-bg466 (and I think the same goes for u-bi805):
In rose-suite.conf:

  • ACCOUNT_USER=<YourMonsoonProject>
  • HOST_XC40=xcs-c
  • HPC_QUEUE=normal
  • REDISTRIBUTE_OZONE=false # Turn off ozone redistribution

In site/meto_cray.rc:

  • In [[EXTRACT_RESOURCE]] change batch system = background

At the moment I can’t locate some of the input ukcmip6 data files on Monsoon. I’ll have a bit more of a look around and if I can’t locate I’ll copy them across from the Met Office.

Regards,
Ros.

Hi Qinggang,

I’ve copied the start files for u-bg466 that are under ukcmip6/N96O1_ensemble1_dumps to /projects/umadmin/rhatcher/ukcmip6/N96O1_ensemble1_dumps

Similarly for u-bi805 see /projects/umadmin/rhatcher/ukcmip6/ssp585_N96O1_ensemble1_dumps

Cheers,
Ros.

Hi Ros

Thanks a lot for the instructions and the copied files. I followed them but my suite u-cl202 (copied from u-bi805) still failed in the coupled task.

I copied the job.err file from /home/d05/qgao/cylc-run/u-cl202/log/job/20150101T0000Z/coupled/01/job.err to /home/d05/qgao/job.err. I cannot identify where the problem is based on this file as copied below, could you please advise?

????????????????????????????????????????????????????????????????????????????????
??????????????????????????????      WARNING       ??????????????????????????????
?  Warning code: -80
?  Warning from routine: PRELIM
?  Warning message:
?          Field - Section:3, Item:353 ignored.
?          Invalid pseudo-level type.
?  Warning from processor: 0
?  Warning number: 10
????????????????????????????????????????????????????????????????????????????????

Rank 1014 [Sun Jan 23 14:32:25 2022] [c7-0c2s7n3] application called MPI_Abort(comm=0xC4000009, 1) - process 1008
Application 162430958 is crashing. ATP analysis proceeding...
Rank 1049 [Sun Jan 23 14:32:25 2022] [c7-0c2s7n3] application called MPI_Abort(comm=0xC4000003, 1) - process 1043

...

atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
_pmiu_daemon(SIGCHLD): [NID 01504] [c7-0c2s8n0] [Sun Jan 23 14:37:27 2022] PE RANK 1050 exit signal Aborted
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
[NID 01504] 2022-01-23 14:37:27 Apid 162430958: initiated application termination
[FAIL] run_model # return-code=137
2022-01-23T14:37:33Z CRITICAL - failed/EXIT

Best regards, Qinggang

Qinggang

It’s always worth looking in /home/d05/qgao/cylc-run/u-cl202/work/20150101T0000Z/coupled/ocean.output - here’s the error:

===>>> : E R R O R
         ===========

                     iom_open ~~~
 File ./restart_trc.nc* not found

Try replacing /data/d01/ukcmip6/ssp585_N96O1_ensemble1_dumps/bg466o_20150101_restart_trc.nc by
/projects/umadmin/rhatcher/ukcmip6/ssp585_N96O1_ensemble1_dumps/bg466o_20150101_restart_trc.nc

(this is in ocean _passive_tracers->env->Initialisation Settings)

There may be more file paths that need changing - try grep’ing for /data/ in the suite files.

Grenville

Hi Grenville

Thanks for the tips. It works now.

May I ask, how can I get access to “Met Office collaboration Twiki”? I have seen in many places there are some instructions about how to run the suites and debug, but I cannot access them even with the VPN of the University of Cambridge. Unfortunately, I do not have access to the VPN of BAS.

Best regards, Qinggang

Hi Qinggang,

The Met Office Collaboration Twiki has been retired. Monsoon2 User Guide is now on MOSRS: https://code.metoffice.gov.uk/doc/monsoon2/index.html

If you can’t find what you’re looking for there, post the old link you’re after and we might be able to point you its new location.

Regards,
Ros.

Hi Ros

Thanks a lot. I run both copies of u-bg466 and u-bi805. So this issue can be closed. Thanks very much.

Best regards, Qinggang

Thanks for letting us know.

Cheers,
Ros