I am trying to run my piControl HadGem3-HC3.1-LL suite u-dm681 on puma2. My suite is a copy of the standard job u-as037. I have changed the suite to run for 1 month (down from 500yr) and turned off a lot of the stash requests. Any other differences between the suites is simply my attempt to get the suite running (project names, user names etc).
When I run the job, all of the fcm_make_* files and recon succeed. But when it launches the coupled step the job failes. The job logs did not reveal much to help me identify the problem. The end of the job.err has the following:
???
??? WARNING ???
? Warning code: -100
? Warning from routine: CHECK_RUN_DIFFUSION
? Warning message:
? cldbase_opt_sh set to sh_wstar_closure since
? Smagorinsky diffusion not chosen.
? Warning from processor: 0
? Warning number: 9
???
Above this, there was a stream of errors of the form:
MPICH ERROR [Rank 200] [job id 8589187.0] [Thu Jan 30 13:34:21 2025] [nid003203] - Abort(1) (rank 200 in comm 480): application called MPI_Abort(comm=0x84000003, 1) - process 200
I have not yet successfully run the job yet. It appears to me like it can’t submit the job but I am not sure why. Any ideas on what might be causing this would be much appreciated,
u-as037 has not been run for some time and has a pre puma2 nemo working copy source. It looks like you changed some NEMO settings that caused the problem. We will commit the nemo_sources branch (you can use /home/n02/n02/ros/nemo/branches/dev_r5518_GO6_package for now) and revert nemo_path_excl to NEMOGCM/NEMO/OPA_SRC/TRD/trdtrc.F90.
My copy of u-as037 with /home/n02/n02/ros/nemo/branches/dev_r5518_GO6_package is running okay.
Excellent. That was the problem and it is running now.
I have decided to attempt to convert the piControl run to an SSP245 run. By comparing suites on trac I can see what changes are needed for the MO supercomputer. Is there a suite ported over to archer2 which has ozone redistribution which you could suggest I look at? I was told this might be tricky so it would be good to see how it is implemented. I would be also interested to hear about how porting it went. For example, was it hardware dependent? Or did the fix work much like on the monsoon?
Orography files are now in /work/y07/shared/umshared/hadgem3/ancil/atmos/. There were a couple of N96 directories; I’ve guessed at n96e_orca025_go6. If that is not the correct directory please let me know.
After a bit of a hunt I’ve managed to find the start dump directory on the old Met Office XCE/F. The files are on ARCHER2 under /work/y07/shared/umshared/ukcmip6/ssp585_N96O1_ensemble1_dumps
I am still working on getting the piControl suite working. Lots of my questions have already been asked and answered in other channels. Which has been very handy.
The suite it running up until pptransfer. I have followed the steps for setting up the Globus transfer and that went okay. There is limited information in the job.error for why the transfer if failing. It fails not long after starting the run. I have run it twice to make sure it was repeatable. Here is the gist of the error:
If you are wanting to automatically transfer data over to JASMIN you will need to go through the Globus setup instructions here: Configuring PPTransfer using Globus
And the suite will also need to be modified as it hasn’t been updated for use with Globus.
In panel fcm_make_pp → Config → pp_sources
Change the revision number of the postproc_2.3_pptransfer_gridftp_nopw branch from 4557 to 5411.
In file app/postproc/rose-app.conf add the following variables to the [namelist:pptransfer] section and set gridftp=false:
Ok. I’ll take a look at you suite and get back to you. The separate checksumming step is not used for Globus so there is some switch not set properly.
But one quick question. When you went from postproc_2.3 to postproc_2.4 how did you do the upgrade? Did you run the rose app-upgrade command or just change the branch names?
Try turning verify_checksums off in postproc. That can only be used for rsync transfers.
In fcm_make_pp set the meta to be: archive_and_meaning/fcm_make/postproc_2.4 to pick up the fcm_make_pp metadata rather than the postproc app metadata.
Hi Ros, the files for orography in n96e_orca025_go6 → n96 is what I was looking for, I assume orca025 mean 0.25 grid. Could you tell me what go6 means please? I am looking to recreate the HadGEM3-GC3.1-N96ORCA1 set up.