I am quite new to Monsoon2 and I want to run the suite u-ar766/trunk (HadGEM3-GC3.1-LL PI control for CMIP6) on it, so I copied it to u-ck646. But I met several problems:
In the suite conf/Machine Options, there are only two options ‘MetO Cray’ and ‘Archer’. So I followed the ticket (http://cms.ncas.ac.uk/ticket/3256) to make the suite run on Monsoon2. But there is no original files ‘monsoon.rc’ and ‘monsoon_restart.rc’, so I ‘steal’ it from ‘/home/d05/mvguar/roses/u-bt694/site’, whose author raised the ticket. May I ask, do you have a better way to transfer the suite to run on Monsoon2?
As I use the suite for research purposes, it is required to use the file " remove_cmip6_metadata.py" to remove the metadata. However, I cannot find an equivalent of “ROSE_PYTHONPATH = ‘/home/h03/fcm/rose/lib/python’” on Monsoon2 to run this script. Therefore, the task “valid_suite_info” failed and I have to manually reset it to “succeeded” every time. Is there another better way?
I modified the required initial conditions in “/home/d05/qgao/roses/u-ck646/rose-suite.conf”, e.g. CICE_INIT, NEMO_ICEBERGS_START, NEMO_START, REBUILD_NEMO_SCRIPT based on the above-mentioned suite “/home/d05/mvguar/roses/u-bt694”. But the task “recon” still failed with output here (’/home/d05/qgao/cylc-run/u-ck646/log/job/18500101T0000Z/recon/01/job.err’). The main problem is “Error message: Failed to open file /data/d01/ukcmip6/Restarts/u-aq853/aq853a.da25940101_00”. I cannot find where the suite requires this file, can you please advise?
Thanks for your time in advance. If you might need any additional information, please let me know.
Depends on the suite sometimes the meto_cray files with work with a couple of tweaks. Adding a monsoon.rc file is the recommended way to make it easy to switch between the 2 platforms.
Use home/d04/fcm/rose/lib/python
Did you solve this? I can’t see any run output for u-ck646.
The directory /home/d05/qgao/cylc-run/u-ck646/log/job/18500101T0000Z/recon/01/job.err no longer exists.
/data/d01/ukcmip6/Restarts/u-aq853/aq853a.da25940101_00 is the start dump - search for ainitial in the GUI.
Thanks a lot. I solved the problem about “ainitial” through changing it in roses/u-ck646/app/um/rose-app.conf as I cannot modify it in the Rose GUI. But I still met a problem when running the model.
I copied the error message cylc-run/u-ck646/log/job/18500101T0000Z/coupled/01/job.err to ‘/home/d05/qgao/job.err’, which gives the error message as below. I am not sure where the problem is. For UM/namelist/Model Input and Output/STASH Requests and Profiles/STASH Requests, I switched on only the package ‘CMIP6-core’ and everything with use_name UPCOUP as indicated here (http://cms.ncas.ac.uk/ticket/3538). But even after I switched on " CMIP6-N96HGM3" as required here: (https://code.metoffice.gov.uk/trac/ukcmip6/wiki/runs/u-ar766), it still gives the same error message.
Can you please advise?
Best regards, Qinggang
???
??? WARNING ???
? Warning code: -30
? Warning from routine: PRELIM
? Warning message:
? Field - Section:5, Item:244 request denied.
? Unavailable to this model version.
? Warning from processor: 0
? Warning number: 60
???
???
???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
? Error code: 1
? Error from routine: ST_DIAG3
? Error message: Heat Flux not on TEM pressure levels - requested pressure levels do not match, diagnostic aborted
? Error from processor: 0
? Error number: 61
???
[0] exceptions: An non-exception application exit occured.
[0] exceptions: whilst in a serial region
[0] exceptions: Task had pid=47251 on host nid01154
[0] exceptions: Program is “./atmos.exe”
u-ar766 was ported to Monsoon officially as u-ar930 so I would suggest you start with that. Make sure it runs first before you make your configuration changes.
Thanks a lot. u-ar930 works well with some modifications.
Besides, I plan to modify the historical simulation (u-bg466) and ssp585 future scenario simulation (u-bi805) with HadGEM3-GC3.1-LL in CMIP6. However, these two suites are both designed for MetO Cray and Archer, are they officially ported to Monsoon as well? If so, can you let me know the corresponding suites? And where can I find such information about whether they are ported to Monsoon or not?
u-bg466 and u-bi805 don’t have official Monsoon ports but since Monsoon is a portion of the Met Office XCS it’s easy to change without having to create new *.rc files.
For u-bg466 (and I think the same goes for u-bi805):
In rose-suite.conf:
ACCOUNT_USER=<YourMonsoonProject>
HOST_XC40=xcs-c
HPC_QUEUE=normal
REDISTRIBUTE_OZONE=false # Turn off ozone redistribution
In site/meto_cray.rc:
In [[EXTRACT_RESOURCE]] change batch system = background
At the moment I can’t locate some of the input ukcmip6 data files on Monsoon. I’ll have a bit more of a look around and if I can’t locate I’ll copy them across from the Met Office.
Thanks a lot for the instructions and the copied files. I followed them but my suite u-cl202 (copied from u-bi805) still failed in the coupled task.
I copied the job.err file from /home/d05/qgao/cylc-run/u-cl202/log/job/20150101T0000Z/coupled/01/job.err to /home/d05/qgao/job.err. I cannot identify where the problem is based on this file as copied below, could you please advise?
????????????????????????????????????????????????????????????????????????????????
?????????????????????????????? WARNING ??????????????????????????????
? Warning code: -80
? Warning from routine: PRELIM
? Warning message:
? Field - Section:3, Item:353 ignored.
? Invalid pseudo-level type.
? Warning from processor: 0
? Warning number: 10
????????????????????????????????????????????????????????????????????????????????
Rank 1014 [Sun Jan 23 14:32:25 2022] [c7-0c2s7n3] application called MPI_Abort(comm=0xC4000009, 1) - process 1008
Application 162430958 is crashing. ATP analysis proceeding...
Rank 1049 [Sun Jan 23 14:32:25 2022] [c7-0c2s7n3] application called MPI_Abort(comm=0xC4000003, 1) - process 1043
...
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
_pmiu_daemon(SIGCHLD): [NID 01504] [c7-0c2s8n0] [Sun Jan 23 14:37:27 2022] PE RANK 1050 exit signal Aborted
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
[NID 01504] 2022-01-23 14:37:27 Apid 162430958: initiated application termination
[FAIL] run_model # return-code=137
2022-01-23T14:37:33Z CRITICAL - failed/EXIT
It’s always worth looking in /home/d05/qgao/cylc-run/u-cl202/work/20150101T0000Z/coupled/ocean.output - here’s the error:
===>>> : E R R O R
===========
iom_open ~~~
File ./restart_trc.nc* not found
Try replacing /data/d01/ukcmip6/ssp585_N96O1_ensemble1_dumps/bg466o_20150101_restart_trc.nc by /projects/umadmin/rhatcher/ukcmip6/ssp585_N96O1_ensemble1_dumps/bg466o_20150101_restart_trc.nc
(this is in ocean _passive_tracers->env->Initialisation Settings)
There may be more file paths that need changing - try grep’ing for /data/ in the suite files.
May I ask, how can I get access to “Met Office collaboration Twiki”? I have seen in many places there are some instructions about how to run the suites and debug, but I cannot access them even with the VPN of the University of Cambridge. Unfortunately, I do not have access to the VPN of BAS.