Hello CMS,
I was running a UKESM suite (u-dh564) on ARCHER2. The run stopped at “coupled” task in the first cycle. However, I didn’t find any error information in job.out. I’ve tried restarting the suite multiple time but ended up the same. Just wondering any ideas for things I could check?
Cheers,
Jing
Hi Jing,
The coupled task has core-dumped. I would suggest doing a completely clean run with rose suite-run --new
to completely delete the cylc-run directories and start from scratch. I can see this suite has been run and got further before so it’s possible it’s got itself in a mess.
Each task generates a job.out AND a job.err file. Additionally there are further ocean, ice and the atmos pe files in the /work directory that can provide further information.
Regards,
Ros.
Hi Ros,
I ran the suite with rose suite-run --new, but it still stopped with the same error. I’ve checked log files of each model component. None of them report an error.
Cheers,
Jing
Hi Jing,
What have you changed since the time it ran ok?
Regards,
Ros
Hi Ros,
I wanted to get it restart in 1850.
I changed restart files. I have restart files named 2277 taken from a pi-control run, so I linked these files to my working directory and renamed them as 1850. Then I changed model basis time to 1850,1,1,0,0, and set i_override_date_time=2, new_date_time=1850,1,1,0,0,0 in the UM namelist. The coupled task stopped in the middle of reading NEMO namelist, even before the first timestep. I have no idea what happened.
Cheers,
Jing
Hi Ros,
I’ve tried changing the restart files back to the time it ran Okay. I’ve also tried keeping the name of 2277 restart files but changed model basis time to 2277,1,1,0,0. However, neither works. They can’t run any further and stopped with no errors.
Cheers,
Jing
Hi Jing,
If the suite no longer runs with the original start files then there must be something else you have changed as well. I’ve run the original job (u-de744), from 2006 start files, and it works fine for me.
I suggest you start from a fresh known working version and try again.
Regards,
Ros.
Thanks, Ros. You’re right that something must be changed. The NEMO jphgr_mesh parameter was accidentally changed. After swapping it back, the suite is running Okay.
Cheers,
Jing