I am trying to modify a suite (Which was running on MONSOON) to run on ARCHER2. I am carrying forward Iodine development work by Ewa (she has done all development on MONSOON and a rose suite of her job is u-cn823). I have taken a copy of job u-cn823 and the new one is u-cp798. How to modify this suite to run on ARCHER2? I think I have to modify ‘Host Machine’ for ARCHER2 and some (maybe more!) directories to run this on ARCHER2.
Can you please point me to any page available to modify the suite to run on ARCHER2? I think some instructions to modify a Monsoon suite to run on ARCHER2 were available on http://cms.ncas.ac.uk/wiki/Archer2 but it seems this page is moved somewhere.
Thanks for pointing to this page. I have followed the instructions and modified the suite (i.e. u-cp798). The suite I am converting is a copy of a Monsoon suite. I am unable to submit the job on ARCHER2 and getting the following error message:
[INFO] REGISTERED u-cp798 → /home/akpandeyjnu/cylc-run/u-cp798
[FAIL] cylc validate -o /tmp/tmph4sFgn --strict u-cp798 # return-code=1, stderr=
[FAIL] Jinja2Error:
[FAIL] File “/home/akpandeyjnu/cylc-run/u-cp798/site/archer2.rc”, line 159, in top-level template code
[FAIL] execution time limit = {{CLOCK}}
[FAIL] UndefinedError: ‘AINITIAL’ is undefined
Dear CMS,
I think the problem is with startdump file and I have modified the archer2.rc file in directory /home/akpandeyjnu/cylc-run/u-cp798 with the startdump file - ‘/work/n02/n02/alok/ukca_start/ce185a.da20091201_00’ but after running the ‘rose suite-run’ the archer2.rc file automatically restores to previous one and still getting the same error. Is it defined somewhere in rose?
Please point me the directories for CMIP6_AEROCHEM_EMS, CMIP6_BIOG_EMS, CHEM_INIT_FILE, SST_SICE_ANCIL directories on archer2.
Any changes you make to the suite must be in the ~/roses/u-cp798 directory. Change the site/archer2.rc file in there. The files under ~/cylc-run are then generated from these.
You should find all the equivalent ancil paths under the central UMDIR on ARCHER2: /work/y07/shared/umshared
e.g. /projects/ancils/cmip6 on Monsoon is /work/y07/shared/umshared/cmip6 on ARCHER2
Thank you for pointing ancil path. I have modified archer2.rc file under /home/akpandeyjnu/roses/u-cp798/site and now able to submit the job from pumatest, but fcm_make_um fails and job.err file has the following message:
[FAIL] config-file=/home/akpandeyjnu/cylc-run/u-cp798/work/20091201T0000Z/fcm_make_um/fcm-make.cfg:3
[FAIL] config-file= - svn://pumatest/um.xm_svn/main/branches/dev/simonwilson/vn11.0_archer2_compile/fcm-make/ncas-ex-cce/um-atmos-safe.cfg
[FAIL] svn://pumatest/um.xm_svn/main/branches/dev/simonwilson/vn11.0_archer2_compile/fcm-make/ncas-ex-cce/um-atmos-safe.cfg: cannot load config file
[FAIL] svn://pumatest/um.xm_svn/main/branches/dev/simonwilson/vn11.0_archer2_compile/fcm-make/ncas-ex-cce/um-atmos-safe.cfg: not found
[FAIL] svn: warning: W170000: URL ‘svn://pumatest/um.xm_svn/main/branches/dev/simonwilson/vn11.0_archer2_compile/fcm-make/ncas-ex-cce/um-atmos-safe.cfg’ non-existent in revision 112491
[FAIL]
[FAIL] svn: E200009: Could not display info for all targets because some targets don’t exist
The job-activity.log has the following information:
[jobs-submit ret_code] 0
[jobs-submit out] 2022-09-16T15:19:30Z|20091201T0000Z/fcm_make_um/01|0|20124
2022-09-16T15:19:30Z [STDOUT] 20124
[((‘event-mail’, ‘failed’), 1) ret_code] 0
Please could you suggest to me the possible cause?
The error is because it can’t find the compile configs for ARCHER2. We do not have UM vn11.0 installed on ARCHER2. You will need to upgrade to a newer version of the UM.
Thanks for this. I have modified the rose-app.conf file with the AMIP job app/fcm_make_um/rose-app.conf. It succeeded for fcm_make_um, fcm_make2_um and install_ancil. Now I am getting error in recon and the error message is:
The following have been reloaded with a version change:
cce/11.0.4 => cce/12.0.3
[WARN] file:STASHC: skip missing optional source: namelist:exclude_package(
[WARN] file:RECONA: skip missing optional source: namelist:trans(
[FAIL] file:STASHmaster=source=fcm:um.xm_br/dev/ewabednarz/vn11.0_DEST_plus_Iodine/rose-meta/um-atmos/vn11.0/etc/stash/STASHmaster@HEAD: bad or missing value
2022-09-21T13:51:37Z CRITICAL - failed/EXIT
Please note the chemistry scheme that was developed at vn11.0 by my colleagues on MONSOON and I am trying to run the same job on ARCHER2.
Thanks for pointing towards upgrading documentation.
I have tried to upgrade the current suite (u-cp798) and followed the instructions. But fcm_make_um has failed.
I thought I am missing some steps, so I have created a new copy of the MONSOON Iodine job (u-cn823), upgraded the suite to 13.0 and followed Porting documentation. The new suite is u-cr052.
I am getting the same error message in both suites (u-cp798 and u-cr052). The job.err file has the following information:
[FAIL] um/src/atmosphere/UKCA/photolib/calcjs_mod.F90: merge results in conflict
[FAIL] merge output: /home/akpandeyjnu/cylc-run/u-cr052/share/fcm_make_um/.fcm-make/extract/merge/um/src/atmosphere/UKCA/photolib/calcjs_mod.F90.diff
[FAIL] source from location 0: (none)
[FAIL] source from location 1: svn://pumatest/um.xm_svn/main/branches/dev/lukeabraham/vn11.0_ukca_linox_tweaks/src/atmosphere/UKCA/photolib/calcjs_mod.F90@51068
[FAIL] !!! source from location 3: svn://pumatest/um.xm_svn/main/branches/dev/ewabednarz/vn11.0_DEST_plus_Iodine/src/atmosphere/UKCA/photolib/calcjs_mod.F90@97107
[FAIL] fcm make -f /home/akpandeyjnu/cylc-run/u-cr052/work/20091201T0000Z/fcm_make_um/fcm-make.cfg -C /home/akpandeyjnu/cylc-run/u-cr052/share/fcm_make_um -j 4 mirror.target=login.archer2.ac.uk:cylc-run/u-cr052/share/fcm_make_um mirror.prop{config-file.name}=2 # return-code=2
2022-09-29T14:24:49Z CRITICAL - failed/EXIT
I believe it is a little tricky to run as it is a copy of the MONSSON suite with UMv11.0 which is not available on ARCHER2.
u-cr001
Meanwhile, I have also tried to take a copy of u-bd366 (An ARCHER TS2000 nudged suites - GA7.1 StratTrop suites - UKCA) and upgraded it following Upgrading — Rose Documentation 2.0.0 documentation. Further, I followed Porting to submit it on ARCHER2, but unfortunately, it is failing in fcm_make2_um with the following error message:
The following have been reloaded with a version change:
Regarding u-cr001: That is a copy of a UM11.2 suite so all you need to do is port it to ARCHER2 using the Porting instructions. You don’t need to upgrade the suite as UM11.2 is installed on ARCHER2.
You can’t run a suite at one UM version (e.g. vn13.0) with code branches from a different UM version (e.g. vn11.2)
Who are you working with to take forward the Iodine development work previously started by Ewa? Is it the same group that Ewa was working with? - the Lancaster UKCA group?? If so why can’t you get access to Monsoon to continue the work on there rather than have to port to ARCHER2?
I think I should explain: the developments were actually initially done on ARCHER1, and were only moved to Monsoon once ARCHER1 was shutting down so the work can continue, and then never moved back to ARCHER2 (as I was changing institutions/projects).
Now we have a NERC grant that Alok is working on that includes a lot of ARCHER2 time for production runs (after some further developments to the scheme), since Monsoon is not really meant to be used for production runs.
Do you reckon it would be possible to get UM vn11.0 installed on ARCHER2? Is that something that CMS could potentially assist with?
Thanks for the explanation Ewa. One of the reasons we installed from UM11.1 onwards on ARCHER2 was because there was a significant change that went into the UM that means the setup/configuration of UM11.0 is different to versions 11.1+ so would take more work for us to install and we just don’t have the resources to be able to port & support all UM versions unfortunately. Upgrading your suite from UM11.0 to UM11.1 shouldn’t be too much trouble. We and other users have successfully upgraded suites.
I would suggest trying to upgrade the suite one version to UM11.1 which we do support on ARCHER2. Whilst we encourage people to not get too far behind in UM versions, trying to upgrade to UM13.0 is just way too big a jump to make in one step and certainly won’t work with your UM11.0 branch without a lot of work. Moving it to UM11.1 hopefully won’t be very difficult.
Alok, I’d suggest to please try upgrading the suite to UM11.1 following the instructions Grenville linked to above (ie. running rose app-upgrade -a vn11.1 in the app/fcm_make_um and app/um directories). Then port the suite to ARCHER2 using the porting instructions.
Remove Luke’s branch: branches/dev/lukeabraham/vn11.0_ukca_linox_tweaks as that went into UM11.1 code release.
Replace Mohit’s branch with the vn11.1 equivalent: branches/dev/mohitdalvi/vn11.1_ukca_fix_o3_ste
You will then need to upgrade Ewa’s branch to vn11.1. To do this create a new branch:
fcm bc DEST_plus_Iodine fcm:um.x-tr@vn11.1
fcm co fcm:um.x-br/dev/<your-mosrs-usrname>/vn11.1_DEST_plus_Iodine
Thanks for the responses. I am working with the Lancaster UKCA group and we decided to switch to ARCHER2 for all Iodine development work. I believe, Ewa has explained the requirement of using ARCHER2. Many Thanks Ewa!
Thanks for explaining upgradation and porting nicely. I thought initially - the most recent update will be more helpful – that is not the case. I am going to take a new copy of the Ewa job again and upgrade it to UM11.1 and then port the suite to ARCHER2.
Meanwhile, I tried to run a copy of u-bd366 and followed porting instructions. The new job is u-cr175 and it has failed in recon with the following error:
The following have been reloaded with a version change:
If you look in the old archer.rc file you’ll see it sets these variables. These are non-standard variables so not present in our provided template *.rc files so you need to add these yourself in the appropriate places in the archer2.rc file.
Thanks for this. I have modified archer2.rc file (I have used archer.rc to modify archer2.rc file). This is still failing in recon with a different error message. The job.err file has the following error message:
???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
? Error code: 1
? Error from routine: io:file_open
? Error message: Failed to open file /work/n02/n02/ukca/initial/N96eL85/au917a.da20080901_00
? Error from processor: 0
? Error number: 3
I have not found the directory/file ‘/work/n02/n02/ukca/initial/N96eL85/au917a.da20080901_00’ on Archer2.
Is this error associated with the unavailability of the file or something else?