Hope this email finds you well. Recently I have been encountering strange errors PMIX ERROR and the run stopped.
I am not sure what this means? Could you please help.
Best,
Wenyao
Model run excludes a change from ticket jules:#194 as
l_accurate_rho=.FALSE.
This will mean that an inaccurate estimate of surface air
density will be used
[WARNING] WARN_JULES_TEMP_FIXES:
jules:#1279 fix to remove persistent small snow amounts
when using the frac_snow_subl_melt=1 option is not enabled:
l_fix_snow_frac = .FALSE.
[WARNING] init_ic: Provided variable ‘gs’ is not required, so will be ignored
[host539.jc.rl.ac.uk:257165] PMIX ERROR: NO-PERMISSIONS in file gds_dstore.c at line 702
[host539.jc.rl.ac.uk:257165] PMIX ERROR: NO-PERMISSIONS in file gds_dstore.c at line 711
Thank your reply. I am running using a cylc suite. I am working with /home/users/gwy1998/roses/u-de307 which is a copy of the u-cr731 (the benchmark suite). Thanks a lot for your help.
I see that this error is appearing in lots of your runs (all perhaps?) but the jobs seem to be running to a successful finish.
If you’d like to check that your results look good, I guess you can ignore the message for now. The OpenMPI was built by JASMIN, but it’s all going to change for the new operating system so as long as you’re happy with the output files from your experiment, this error will hopefully not appear in the new builds of the libraries that we will be moving to.
If you’re not happy with this, though, then I guess I can escalate it with JASMIN themselves. But if you are then please keep going and I will email everyone when we are changing to new libraries in the future.
Which nc files are not being produced?
You have over 500 which have been made in: /work/scratch-pw2/gwy1998/fluxnet/run11a/multisingle-test/output_GAL9/ for example
Give me an example of something which should have been made and hasn’t been.
For instance, /work/scratch-pw2/gwy1998/fluxnet/run11a/multisingle-test/output_GAL9/AU-Lit
local_AU-Lit_fluxnet2015_GAL9.dump.spin1.20160101.0.nc This only produces the dump file but not the main run file. But I guess if this error doesn’t affect the run maybe this is not the reason for not producing the results.
I’ve looked at that specific task, and you’re probably correct that there is something else causing the issue.
The job actually fails with a segfault in qsat - so this means that an array of data is probably the wrong length or something like and the model errors as it’s not able to read or write somewhere. If you think the model is generally good, then perhaps look at any changes to input data. The crash was in the part of the code that looks at specific humidity, but things that define bounds of arrays may be changed?
I’m afraid I’m not an expert on your setup, but this is what I would look at rather than the MPI error message.