I have a newly copied suite (u-dh833) taken from Monsoon to run on ARCHER2. I want to transfer the outputs to JAMSIN, but not sure how to do it properly. Below are a few things that I thought I need to change, please check:
I have changed “archive_root_path” under “Archer Archiving” to “/$ROSE_DATAC”
I have no ide what to change in the “transfer_dir” under JASMIN Transfer from the original line “/gws/nopw/j04/ukca_vol1/nlabraham/archive”.
Any other changes needed? Thanks!
First you need to change the version of the postproc branch. The one you are using is out of date and won’t work. Looks like this suite is using optional configuration overrides so you’ll need to change this in the app/fcm_make_pp/opt/rose-app-archer2.conf file: Change the version of the postproc_2.3_pptransfer_gridftp_nopw branch from 4422 to 5095.
You will need to re-run the suite, to pick up the changes and rebuild the postproc code.
Set archive_root_path to$ROSE_DATAC - no need for the leading /
transfer_dir is the location where you want to put the data on JASMIN. E.g. A GWS to which you have access.
Regarding point 3, I am not sure what you meant exactly, as I do not know what GWS referes to. After I checked old archived data from previous runs, it seems some model outputs were archived under /work/n02/n02/emxin/cylc-run/u-xxxxx/share/data/History_data. is it the right place to store data? I do not see an “archive” directory there.
In addition, how about using this path: /gws/nopw/j04/ukca_vol1/emxin/archive, is it working? Please help. Thanks
JASMIN GWS stands for JASMIN Group Workspace. In order to use any of the JASMIN group workspaces you have to be a member of the project. You can put data under /gws/nopw/j04/ukca_vol1 assuming you have access to the the UKCA GWS. I don’t know which projects you are a member of, in order to advise where to put your data.
I previously used ukca_vol1 as our output on JASMIN, and it seemed working well, but since I moved to /gws/nopw/j04/bas_climate last week, I found I could not find the model output anywhere, which is very strange. This happened after I re-upload my SSH-key to the my JASMIN account as I could not log into it.
Since then, I changed job JASMIN Transfer_dir from old /gws/nopw/j04/ukca_vol1/nlabraham/archive to a new directory /gws/nopw/j04/bas_climate/users/xinyang/archive (eg in u-dk569). However, then I could not find any of the model outputs (old and new) on JAMSIN (neither in /home nor in sci-vm-01.jasmin.ac.uk). Maybe I have to move back to ukca_col1? Please help. Thanks!
Best,
Xin
This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
You suite u-dk569 currently has transfer_dir set to /gws/nopw/j04 ?? It looks like you’ve re-run the job since the above as there are no log files for the pptransfer task so I cannot diagnose what has happened. I also do not have access to bas_climate GWS.
Please reset the transfer_dir to /gws/nopw/j04/bas_climate/users/xinyang/archive and re-run the suite. Let me know once the pptransfer task has run and then I can take a look at the log files and see where it is transferring the data to.
Strangely, I could not add bas_climate to the Transfer_dir box in Rose. I submitted several jobs, however, after I re-opened the saved suites, I found Transfer_dir was unchanged(or unsaved), showing “/gws/nowp/j04” even I added the second part “/bas_climate/users/xinyang/archive” to it. But it was gone/missed. I do not know what happened. Maybe I am not allowed to use this space? BTW, my JASMIN account name is “xyang” but my puma and ARCHER2 account is emxin. Will this conflict cause the problem?
Best,
Xin
This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
I don’t know why you can no longer edit that box. Can you edit others? I would suggest logging out of PUMA2 and back in again.
If you still can’t edit it, you can change the suite files direct. You will find the transfer_dir variable in the file ~/roses/<suiteid>/app/postproc/rose-app.conf.
It doesn’t matter that your JASMIN & ARCHER2/PUMA2 usernames are different. Gridftp handles this when you set up the short-lived credentials certificate.
This is indeed very strange. E.g. I made some changes in rose, eg by changing the run length from 1 month to 2 days, however, the suite completely ignored them (after I saved it before I run the job). I guessed that this could be due to a “mistake” (not sure) I made it, for instance, when I did “Rosie copied a suite”, as I used a command “:w” to save the file generated (see attached screenshot). It seems that I should not save it (??). I am testing it by leaving it without saving it to see if it makes any difference. Maybe you can tell if this is the case.
Best,
Xin
This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
Sorry, but I don’t understand what you’re saying in your last message. Whether or not you save edits to the rose-suite.info has nothing to do with not being able to edit variables.
Have you tried editing the ~/roses/<suiteid>/app/postproc/rose-app.conf file to set the transfer_dir variable? so we can get the transfers up and running first. Let me know when you’ve done that and run the suite.
Sorry for the confusion in my previous email, and thank you for the clarification on the edits issue.
Yes, I can edit rose-app.conf file, and the changed value was not correctly shown in the Rose config-edit window. However, I could not find my model outputs in JASMIN, this may be a different issue, I was seeking help from JAMSIN helpdesk. Could you help to keep the model output on ARCHER2 (instead of transferring them to JASMIN? as this may be related to another issue below.
Another thing that happened strangely to me is that in Rose config edit window (u-dk706), I changed Total Run Length from 1M to 1D (see u-dk706), the model seemed still running for a whole month, which meant the changed value did not work! Therefore I doubt the above new Transfer_dir path changed via app may not work properly, but I am not sure on it. Could you please take a look? thanks!
Best,
Xin
This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
The changed value has been saved, however, you can’t set Total Run Length to P1D and leave the cycling frequency as P1M. It won’t like that. The run length must be at least as long as the cycling frequency.
Please stop the suite, put the run length back to P1M and then restart the suite (rose suite-run) and then I’ll be able to see what’s going on with the pptransfer step. The transfer_dir is set fine in the suite.
I found I passed a wrong message in my previous email sent an hour ago. “I can edit rose-app.conf file, and the changed value was not correctly shown in the Rose config-edit window” which is not correct, it should be “I can edit rose-app.conf file, and the changed value was not correctly shown in the Rose config-edit window.”
Sorry about it.
Best,
Xin
This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
Thanks for the help! I see the problem in my run length setup. I have changed it back to 1M and submitted a job. It may take a while to complete.
Can I ask another question? from the job.out file, it seems the model output
I have another question to ask. in job.out file it mentioned that the output is under eg
/work/n02/n02/emxin/cylc-run/u-dh833/work/20150901T0000Z/atmos_main/pe_output/dh833.fort6.pe0
However, I could not find the volume /work/n02/n02/. Should I log onto it? Maybe I miss a lot of things re UKCA run. Please help. Thanks.
Best,
Xin
This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
Sorry for keep asking you questions, no worries, if you do not have to answer as it is Friday afternoon.
The question is related to code changing in UKCA. For example, I made some codes change in job u-dk714. but the model failed. From the job.err in fcm_make2_um, I saw error message like:
[FAIL] ftn-113 ftn: ERROR UKCA_PRIM_SS, File = …/…/…/mnt/lustre/a2fs-work2/work/n02/n02/emxin/cylc-run/u-dk714/share/fcm_make_um/preprocess-atmos/src/ukca/src/science/core/aerosols/glomap/ukca_prim_ss.F90, Line = 358, Column = 1
[FAIL] IMPLICIT NONE is specified in the local scope, therefore an explicit type must be specified for data object “RATIOBR”.
But in the line 358 of the file ukca_prim_ss.F90 (under fcm_make_um sources: branches/dev/xinyang/r3049_bsn_br_v1@4957), I could not find the mentioned “RATIOBR” there. As it is in different line. Thus, I want to ask if the error message shown in the job.err is really meaningful to tackle to problem? Thanks!
Best,
Xin
the sources
This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
I have completed a 1-month run for suite u-dk706. Could you take a look at the output? I could not find the model output eg the monthly mean field on both JASMIN and ARCHER2.
Please ignore the email sent out on Friday afternoon regarding u-dk714, as I solved the problem (ie the code referred to in job.err is on ARCHER2 but puma2).
Best,
Xin
This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
Looking at u-dk706. You need to look at how your job is configured. You should be able to see from the tasks in the cylc GUI that only compilation, ancil installation, recon and atmos_main have run. You don’t have postprocessing or pptransfer switched on in the suite.