UKESM run stopped without errors

Hello CMS,

I was running a UKESM suite (u-dh564) on ARCHER2. The run stopped at “coupled” task in the first cycle. However, I didn’t find any error information in job.out. I’ve tried restarting the suite multiple time but ended up the same. Just wondering any ideas for things I could check?

Cheers,
Jing

Hi Jing,

The coupled task has core-dumped. I would suggest doing a completely clean run with rose suite-run --new to completely delete the cylc-run directories and start from scratch. I can see this suite has been run and got further before so it’s possible it’s got itself in a mess.

Each task generates a job.out AND a job.err file. Additionally there are further ocean, ice and the atmos pe files in the /work directory that can provide further information.

Regards,
Ros.

Hi Ros,

I ran the suite with rose suite-run --new, but it still stopped with the same error. I’ve checked log files of each model component. None of them report an error.

Cheers,
Jing

Hi Jing,

What have you changed since the time it ran ok?

Regards,
Ros

Hi Ros,

I wanted to get it restart in 1850.

I changed restart files. I have restart files named 2277 taken from a pi-control run, so I linked these files to my working directory and renamed them as 1850. Then I changed model basis time to 1850,1,1,0,0, and set i_override_date_time=2, new_date_time=1850,1,1,0,0,0 in the UM namelist. The coupled task stopped in the middle of reading NEMO namelist, even before the first timestep. I have no idea what happened.

Cheers,
Jing

Hi Ros,

I’ve tried changing the restart files back to the time it ran Okay. I’ve also tried keeping the name of 2277 restart files but changed model basis time to 2277,1,1,0,0. However, neither works. They can’t run any further and stopped with no errors.

Cheers,
Jing

Hi Jing,

If the suite no longer runs with the original start files then there must be something else you have changed as well. I’ve run the original job (u-de744), from 2006 start files, and it works fine for me.

I suggest you start from a fresh known working version and try again.

Regards,
Ros.

Thanks, Ros. You’re right that something must be changed. The NEMO jphgr_mesh parameter was accidentally changed. After swapping it back, the suite is running Okay.

Cheers,
Jing