I am running the suite u-ds446 on Monsoon3. The model runs successfully for about 20 days, but after 25 days within a month, it stops producing data. The job Atmos_main fails without any visible error message.
Could you please help me investigate why this is happening?
Itβs a little difficult to know where to look when there are 45 runs. I think the problem is stated in run41 in the job.err file β have you been examining the error messages?
Turn off debug level logging as that has impact on performance: app/um/rose-app.conf β [env]PrintStatus=PrStatus_Min
The processor set up seems to have copied from (my?) GAL9 configuration and may not be optimal for UKESM: rose-suite.conf β MAIN_ATM_PROCX=32, MAIN_ATM_PROCY=18,MAIN_OMPTHR_ATM=2 (There are further optimisation being tested for UKESM1.1 with Rank Reordering).
The maximum wall-clock available is 3 hours so might help to complete the last 5 days.
If the runs do continue to hang at 25 days this might be related to data from ancillary being read or a specific process occurring around that day.
It did not work. Atmos main failing after 25 days.
Tanu
Atm_Step: Timestep 1800 Model time: 2019-01-26 00:00:00
Attempt to open file: /common/share/monsoon_ancils/atmos/GC5/n96e/easyaerosol/cmip6_stratos/climatology_1850-2014/v1//volc_aer_extinction_sw.nc returned status= 0 nfid= 65536
Attempt to open file: /common/share/monsoon_ancils/atmos/GC5/n96e/easyaerosol/cmip6_stratos/climatology_1850-2014/v1//volc_aer_absorption_sw.nc returned status= 0 nfid= 65536
Attempt to open file: /common/share/monsoon_ancils/atmos/GC5/n96e/easyaerosol/cmip6_stratos/climatology_1850-2014/v1//volc_aer_asymmetry_sw.nc returned status= 0 nfid= 65536
Attempt to open file: /common/share/monsoon_ancils/atmos/GC5/n96e/easyaerosol/cmip6_stratos/climatology_1850-2014/v1//volc_aer_extinction_lw.nc returned status= 0 nfid= 65536
Attempt to open file: /common/share/monsoon_ancils/atmos/GC5/n96e/easyaerosol/cmip6_stratos/climatology_1850-2014/v1//volc_aer_absorption_lw.nc returned status= 0 nfid= 65536
Attempt to open file: /common/share/monsoon_ancils/atmos/GC5/n96e/easyaerosol/cmip6_stratos/climatology_1850-2014/v1//volc_aer_asymmetry_lw.nc returned status= 0 nfid= 65536
update_pattern: updating coeffc and coeffs
Tot dry mass 0.51291E+19
Tot mass 0.51413E+19
Tot energy 0.13074E+25
tot dry energy 0.13075E+25
gr( rho cal) 0.38159E+24
KE( rho cal) 0.92906E+21
KEu(rho cal) 0.71085E+21
KEv(rho cal) 0.21821E+21
KEw(rho cal) 0.11890E+16
cvT( rho cal) 0.92500E+24
lq ( rho cal) 0.30327E+23
lqcf( rho cal) 0.84982E+20
lqcl( rho cal) 0.64562E+20
Final dry mass of atmosphere = 0.51291E+19 KG
Initial dry mass of atmosphere= 0.51291E+19 KG
Correction factor for rho_dry = 0.10000E+01
Final moisture = 0.12183E+17 KG
Initial moisture = 0.12263E+17 KG
change in moisture = -0.80694E+14 KG
Moisture added E-P in period = -0.80508E+14 KG
Error in moisture = -0.18589E+12 KG
Error as % of change = 0.23037E+00
q ( rho cal) 0.12126E+17
qcf( rho cal) 0.33979E+14
qcl( rho cal) 0.22773E+14
FINAL TOTAL ENERGY = 0.13074E+25 J/
INITIAL TOTAL ENERGY = 0.13072E+25 J/
CHG IN TOTAL ENERGY O. P. = 0.14228E+21 J/
FLUXES INTO ATM OVER PERIOD = -0.38066E+22 J/
ERROR IN ENERGY BUDGET = -0.39489E+22 J/
Attempt to open file: /projects/ukca-admin/analyses/era5/era5_1deg-model-levs_N48L137_2019012600_all.nc returned status= 0 nfid= 65536
Attempt to open file: /projects/ukca-admin/analyses/era5/era5_1deg-model-levs_N48L137_2019012606_all.nc returned status= 0 nfid= 65536