I have run 20 years (1850-1870) of a coupled UKESM job, u-dt663, on Archer2. I had added a few stash items and would like to continue running the model from 1870 onwards but avoid the issues that can occur when restarting a job (i.e I would like the model run to proceed as if it had never stopped). My understanding is that, as I have added new stash items, I can’t just do a rose-suite run --reload and trigger the next coupled task.
To check if I could get bit comparability following a restart, I extracted the 5 initialisation files (one atmosphere dump, 2 ocean, 1 iceberg, 1 sea ice) for 1st 1869 from u-dt663, and ran 1 month of a copy of u-dt663, u-dt998, with BITCOMP_NRUN = true, l_nrun_as_crun = true, RECON=false and the astart set to ‘dt663a.da18690101_00’. This follows the approach in Bit comparability for re-run cycles
However, u-dt998 does not bit compare with u-dt663 (which is running 6m resubmissions as opposed to u-dt998’s 1 month resubmission) for Jan 1869. In a separate suite I tried setting ancil_reftime to the actual date but this returned the same output as u-dt998.
I am therefore concerned that trying to restar tu-dt663 from 1st Jan 1870 by pointing it to the 5 initialisation files approach could cause an issue.
Is there anything needed to avoid this issue when continuing a run?
Sorry, I meant ‘rose-suite run --restart’. When I run this and retrigger the coupled task it fails with
[0] ???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
[0] ? Error code: 5
[0] ? Error from routine: UM_READDUMP
[0] ? Error message: UM_READDUMP Dump does not match STASH list
[0] ? Error from processor: 47
[0] ? Error number: 47
There is nothing in the coupled/pe_output directory.
When doing the rose-suite run --restart, I think it looks in ~/cylc-run/u-dt663/share/data/History_Data and I have put all the initialisations files for 1870 in there.
The jobs must run with the same dump frequency. I can’t see which files were being compared - u-dt663 has Dec 1896 data, but u-dt998 has Jan 1869 data. I’d suggest comparing atmosphere start files ( I don’t see one ffor u-dt998)
u-dt663 had a 6 month dump frequency while u-dt998 used a 1 month dump frequency so I took a new copy of u-dt998, u-du093, and set a 6 month dump frequency and ran it for 6 months.
However, u-du093 doesn’t bit compare for Jan 1869 to u-dt663 (I did a mule-cumf between dt663a.p51869jan.pp /du093a.p51869jan.pp and dt663a.pm1869jan.pp /du093a.pm1869jan.pp in /work/n02/n02/jweber/archive).
File 1: dt663a.pm1869jan.pp
File 2: du093a.pm1869jan.pp
Files DO NOT compare
0 differences in fixed_length_header (with 7 ignored indices)
6865 field differences, of which 6259 are in data
Compared 7270/7270 fields, with 405 matches
Maximum RMS diff as % of data in file 1: 5004128.90625 (field 4132)
Maximum RMS diff as % of data in file 2: 13485.289001464844 (field 5108)
///
File 1: dt663a.p51869jan.pp
File 2: du093a.p51869jan.pp
Files DO NOT compare
0 differences in fixed_length_header (with 7 ignored indices)
5219 field differences, of which 4602 are in data
Compared 5227/5227 fields, with 8 matches
Maximum RMS diff as % of data in file 1: 440508.49609375 (field 779)
Maximum RMS diff as % of data in file 2: 798.71129989624023 (field 2233)
I can’t see an astart in ~/cylc-run/u-du093/share/data but I think that is because I had recon turned off and had set the astart in u-du093/app/um/rose-app.conf to /work/n02/n02/jweber/dump_files/u-dt663/dt663a.da18690101_00.