Postproc failure

Hi Ros,

Thanks, and I will make sure to keen on top of regularly cleaning up my PUMA space.

I started preliminarily analysing some of the output of the model, but I noticed that a couple of the files seem to be corrupted or not saved properly. Is this something that happens sometimes? For example, the file /work/n02/n02/tarlge/archive/u-cz110/19790501T0000Z/nemo_cz110o_1m_19790401-19790501_grid-T.nc. When I try to open the file in ncview on JASMIN it doesn’t recognise it as a NetCDF file, and the file size is small - implying something went wrong.

Is it possible for me to just re-insert the month and re-run this month in a bit-identical way?

Any help would be greatly appreciated!

Cheers,
Tarkan

This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.

Hi Tarkan,

We are seeing some filesystem issues. This could be caused by those. I see there are a few other corrupt files in the 19800201 cycle too.

I will get back to you when I’ve investigated further.

Regards,
Ros.

Thanks Ros,

Is it okay for me to continue running the simulation forward, or would you recommend that I wait?

Cheers,
Tarkan

This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.

Hi Tarkan,

It depends what you want. If those dodgey .nc files are important then you will need to rerun the suite from 1979-04 when Archer2 is back. You can’t just rerun a single cycle. If I remember correctly though, your suite only started in 1978-09, so if it was me, I’d probably just restart it from the beginning again.

Keep an eye on the files - if you see this happening again let us know.

Regards,
Ros.

Hi Ros,

Thanks, I just restarted as you suggested, and will keep an eye on it.

Although my two running suites u-da143 and u-da032 have delete_staged=true, the files from ARCHER2 don’t seem to delete automatically after they are transferred to JASMIN, is this feature working for other suites? Perhaps I need to update my suites if there were changes made to the GC5 suite after I took my copy?

Cheers,
Tarkan

This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.

Hi Tarkan,

Sorry, I’ve just realised looking at the code that deleting staged data has not been modified to work with gridftp transfers. I’ll add this to my to-do. It’s not a trivial fix so will be a little while until this can work. For now you will have to manually delete the staged data.

Regards,
Ros.

Hi Ros,

No problem at all! Manual delete is fine for now, thanks for taking a look.

Cheers,
Tarkan

This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.

Hi Ros,

I just realised that some of my new runs seem to have corrupted files in them. I don’t know whether this was a filesystem issue or an error on my part. Is there something that I could be doing wrong to have this effect?

The file I am looking at is on JASMIN but not ARCHER2 anymore:

/gws/nopw/j04/bas_pog/tarlge/archive/u-da032/19820801T0000Z/nemo_da032o_1m_19820601-19820701_grid-T.nc

It did happen at some point during the run while I was away that my ARCHER2 disk filled up, and then I had to retrigger the run as it had crashed. Is it possible that during this the files got corrupted? The model did continue to run without issue, so I thought it was okay.

Cheers,
Tarkan

This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.

Hi Ros,

I have also been searching the documentation for information about restart files, but I am still not too sure which files are actually required to restart a run, I want to make sure that I save out some restart files for longer runs so that I can re-run parts if necessary, do you know how I can control this?

Many thanks,
Tarkan

This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.

Hi Tarkan,

The frequency at which restarts are archived is controlled by postproc.

See
postproc → Atmos → Archiving
postproc → NEMO → Restart Files

I haven’t got familiar with SI3 yet, but I suspect the ice restart is also controlled by the NEMO options.

I’m just in process of fixing up pptransfer to automatically delete from ARCHER2 after successful data transfer to JASMIN.

Regards,
Ros.

Hi Ros,

Thanks for this, I don’t seem to have this structure of postproc → NEMO anymore on PUMA2, should this be different?

I guess the thing I’m still confused about is the difference between output files and restart files. For example, are these files restart files? or are restart files saved somewhere else?

da400a.pa1982apr.pp nemo_da400o_1m_19820401-19820501_grid-U.nc
da400a.pd1982apr.pp nemo_da400o_1m_19820401-19820501_grid-V.nc
da400a.pe1982apr.pp nemo_da400o_1m_19820401-19820501_grid-W.nc
da400a.pf1982apr.pp nemo_da400o_1m_19820501-19820601_grid-T.nc
da400a.pg1982apr.pp nemo_da400o_1m_19820501-19820601_grid-U.nc
da400a.pk1982apr.pp nemo_da400o_1m_19820501-19820601_grid-V.nc
nemo_da400o_1m_19820301-19820401_grid-T.nc nemo_da400o_1m_19820501-19820601_grid-W.nc
nemo_da400o_1m_19820301-19820401_grid-U.nc si3_da400i_1m_19820301-19820401_icemod.nc
nemo_da400o_1m_19820301-19820401_grid-V.nc si3_da400i_1m_19820401-19820501_icemod.nc
nemo_da400o_1m_19820301-19820401_grid-W.nc si3_da400i_1m_19820501-19820601_icemod.nc
nemo_da400o_1m_19820401-19820501_grid-T.nc

Many thanks,
Tarkan

This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.

Tarkan

This notation refers to tasks in the rose edit suite GUI:

postproc → Atmos → Archiving
postproc → NEMO → Restart Files

There are typically 4 start files needed for a coupled suite one each for the atmosphere and ice and two for the ocean – these files contain prognostic data. Output files contain diagnostic data and play no part in starting the model. All the files referred to above hold diagnostics.

Please consider attending the UM training (https://ncas.ac.uk/study-with-us/introduction-to-unified-model/) - I think it may prove helpful.

Grenville

Hi Grenville,

Thanks for your reply, I can’t seem to see this, perhaps I need to change the value of ‘meta’ here because of the change to PUMA2?

I agree that the UM training would definitely be useful, unfortunately I just missed out on the last one as it was full, and I am away for fieldwork 16th November - 1st January so will miss out on the next one too. I have read through the online self-study notes and try to refer to them as much as possible.

Many thanks,
Tarkan

This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.

click on the little triangle next to posproc

The previous screenshot is already the ‘open’ state of the triangle, if I click it again it collapses the other settings:

This is what I was confused about, on the old PUMA I feel that there were more settings visible in the GUI under postproc. Perhaps I am doing something wrong here.

Thanks,
Tarkan

This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.

correct the path the metadata

`/home/n02/n02/ros/meta/postproc_2.4…

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.