UKESM run stopped without errors

jingjin · 22 October 2024 10:52

Hello CMS,

I was running a UKESM suite (u-dh564) on ARCHER2. The run stopped at “coupled” task in the first cycle. However, I didn’t find any error information in job.out. I’ve tried restarting the suite multiple time but ended up the same. Just wondering any ideas for things I could check?

Cheers,
Jing

RosalynHatcher · 22 October 2024 12:53

Hi Jing,

The coupled task has core-dumped. I would suggest doing a completely clean run with rose suite-run --new to completely delete the cylc-run directories and start from scratch. I can see this suite has been run and got further before so it’s possible it’s got itself in a mess.

Each task generates a job.out AND a job.err file. Additionally there are further ocean, ice and the atmos pe files in the /work directory that can provide further information.

Regards,
Ros.

jingjin · 22 October 2024 15:53

Hi Ros,

I ran the suite with rose suite-run --new, but it still stopped with the same error. I’ve checked log files of each model component. None of them report an error.

Cheers,
Jing

RosalynHatcher · 22 October 2024 18:48

Hi Jing,

What have you changed since the time it ran ok?

Regards,
Ros

jingjin · 23 October 2024 09:53

Hi Ros,

I wanted to get it restart in 1850.

I changed restart files. I have restart files named 2277 taken from a pi-control run, so I linked these files to my working directory and renamed them as 1850. Then I changed model basis time to 1850,1,1,0,0, and set i_override_date_time=2, new_date_time=1850,1,1,0,0,0 in the UM namelist. The coupled task stopped in the middle of reading NEMO namelist, even before the first timestep. I have no idea what happened.

Cheers,
Jing

jingjin · 28 October 2024 12:49

Hi Ros,

I’ve tried changing the restart files back to the time it ran Okay. I’ve also tried keeping the name of 2277 restart files but changed model basis time to 2277,1,1,0,0. However, neither works. They can’t run any further and stopped with no errors.

Cheers,
Jing

RosalynHatcher · 29 October 2024 08:31

Hi Jing,

If the suite no longer runs with the original start files then there must be something else you have changed as well. I’ve run the original job (u-de744), from 2006 start files, and it works fine for me.

I suggest you start from a fresh known working version and try again.

Regards,
Ros.

jingjin · 4 November 2024 10:34

Thanks, Ros. You’re right that something must be changed. The NEMO jphgr_mesh parameter was accidentally changed. After swapping it back, the suite is running Okay.

Cheers,
Jing

Topic		Replies	Views
Suite fails on restart after extending length Unified Model ARCHER2	4	24	22 February 2025
Issues with submitting coupled task Unified Model PUMA , ARCHER2	13	343	13 May 2022
Suite fails to restart Rose/Cylc and FCM ARCHER2	5	288	13 January 2023
'Coupled' task fails on second model cycle Unified Model ARCHER2	3	153	18 October 2023
Rosie go unresponsive / time mis-match Rose/Cylc and FCM ARCHER2	3	131	15 September 2023

UKESM run stopped without errors

Related topics