I am running JULES7.4 model and it can’t run after I changed the setting of the namelists including ancillary files, drive files, initial files and some other settings to create a global running on 0.5 degrees.
I am running using namelist command line so I could not provide the suite number. I appreciate it if anyone could give me some advice about where should I check out for the error.
The message of the error is:
forrtl: severe (408): fort: (3): Subscript #1 of the array ES has value -2147483648 which is less than the lower bound of 0
I am trying to change the initial files based on the successful simulations, but it did not work. I remember the last time I met this problem using N96 grid, and solved it by changing l_trait_phys from .true. to .false., but now it is already set to .false…
The full error message is like:
[WARNING] CHECK_NAMELIST_VALUES: Provided variable 'clay_const_z' is not required, so will be ignored
[WARNING] init_ic: Provided variable 'croplai' is not required, so will be ignored
[WARNING] init_ic: Provided variable 'gs' is not required, so will be ignored
[WARNING] init_ic: Provided variable 'tsoil_deep' is not required, so will be ignored
[WARNING] init_ic: Provided variable 'rfm_surfstore_rp' is not required, so will be ignored
[WARNING] init_ic: Provided variable 'rfm_substore_rp' is not required, so will be ignored
[WARNING] init_ic: Provided variable 'rfm_flowin_rp' is not required, so will be ignored
[WARNING] init_ic: Provided variable 'rfm_bflowin_rp' is not required, so will be ignored
forrtl: severe (408): fort: (3): Subscript #1 of the array ES has value -2147483648 which is less than the lower bound of 0
Image PC Routine Line Source
jules.exe 000000000104D38F for_emit_diagnost Unknown Unknown
jules.exe 0000000000BFB359 Unknown Unknown Unknown
jules.exe 0000000000C22343 Unknown Unknown Unknown
libiomp5.so 00007FA7897D6713 __kmp_invoke_micr Unknown Unknown
libiomp5.so 00007FA7897656A3 __kmp_fork_call Unknown Unknown
libiomp5.so 00007FA78972C63D __kmpc_fork_call Unknown Unknown
jules.exe 0000000000C0B842 Unknown Unknown Unknown
jules.exe 0000000000A36A75 Unknown Unknown Unknown
jules.exe 000000000083C0E7 Unknown Unknown Unknown
jules.exe 000000000041BE83 Unknown Unknown Unknown
jules.exe 000000000040E946 Unknown Unknown Unknown
jules.exe 000000000040E422 Unknown Unknown Unknown
libc-2.17.so 00007FA7890EC555 __libc_start_main Unknown Unknown
jules.exe 000000000040E329 Unknown Unknown Unknown
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[9079,1],0]
Exit code: 152
--------------------------------------------------------------------------
Part of the log is like:
[INFO] init_ic: 'cropdvi' will be set to a constant = -2.000000
[INFO] init_ic: 'croprootc' will be set to a constant = 9.9999997E-05
[INFO] init_ic: 'cropharvc' will be set to a constant = 0.0000000E+00
[INFO] init_ic: 'cropreservec' will be set to a constant = 0.0000000E+00
[INFO] init_ic: 'cropcanht' will be set to a constant = 1.0000000E-03
[INFO] init_ic: 'sthzw' will be set to a constant = 0.0000000E+00
[INFO] init_ic: 'zw' will be set to a constant = 0.0000000E+00
[INFO] init_ic: 'canopy' will be set to a constant = 0.0000000E+00
[INFO] init_ic: 'tstar_tile' will be set to a constant = 0.0000000E+00
[INFO] init_ic: 'cs' will be set to a constant = 0.0000000E+00
[INFO] init_ic: 'rgrain' will be set to a constant = 0.0000000E+00
[INFO] init_ic: 'sthuf' will be set to a constant = 0.0000000E+00
[INFO] init_ic: 't_soil' will be set to a constant = 0.0000000E+00
[INFO] init_ic: 'snow_tile' will be set to a constant = 0.0000000E+00
[INFO] init_ic: 'lai' will be set to a constant = 4.000000
[INFO] init_ic: 'sthu_irr' will be set to a constant = 0.5000000
[INFO] init_ic: Number of soil points = 64583
[INFO] init_ic: Number of land ice points = 3064
[INFO] init_output: Reading JULES_OUTPUT namelist...
[INFO] init_output: Reading JULES_OUTPUT_PROFILE namelist...
[INFO] register_output_profile: Profile with name others registered to provide output for main run from 2012-01-01 00:00:00 to 2013-01-01 00:00:00
[INFO] init_output: Reading JULES_OUTPUT_PROFILE namelist...
[INFO] register_output_profile: Profile with name crop registered to provide output for main run from 2012-01-01 00:00:00 to 2013-01-01 00:00:00
[INFO] init_vars_tmp: Setting satcon to zero at land ice points
......
......
......
[INFO] file_ncdf_close: Closing file /work/scratch-pw3/byxu/output//global.dump.20120101.0.nc
[INFO] init: Initialisation is complete
Hi Beiyao
Thanks for your ticket. Have you looked at your job.err and job.out log files? Do you see anything else in these log files that might help?
Patrick
Thanks for your reply. I pasted all the content of the job.err before, and as for the job.err, everything seems normal. It failed after completing the initialisation, so I can’t find the reason why it had the error.
Hi Beiyao:
What is your suite number? I don’t see that here, but maybe I missed it. Have you checked in all your changes in the suite to MOSRS?
Can you give read permission to your home directory? (You might want to put confidential items someplace else without the read permission.) You can use the following command:
chmod -R g+rX /home/users/byxu
Patrick
Thanks, Beiyao
I have copied your code to my directory /home/users/pmcguire/grid_run.
And when I try to run it with . jules_job.sh from sci3, I get this error message: Please verify that both the operating system and the processor support Intel(R) X87, CMOV, MMX, FXSAVE, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, POPCNT, F16C and AVX instructions.
So I am now trying to run it with sbatch jules_jb.sh from sci3. It’s waiting in the par-multi queue now.
Once it starts to run, how long does it take to produce the error message that you noted? (forrtl: severe (408): fort: (3): Subscript #1 of the array ES has value -2147483648 which is less than the lower bound of 0)
I looked at your two sets of err files jules.34718403.err and jules.34720729.err from Friday night and Saturday morning, and I don’t see the above error message there. Did you change something?
Why aren’t you using a rose/cylc suite for this? It would be much easier to support you if you do use suites. And normally, in suites, we don’t use precompiled executables like you are doing here. Using a precompiled executable might be part of the reason I got the error message above about ...support Intel(R) X87.
Which machine do you run your jules SLURM script on? Is it cylc1 or sci3?
Patrick
Once it starts to run, it only takes 1 minute to produce the error message. The err files jules.34718403.err and jules.34720729.err are generated when the group work space was not working, so the model stuck in reading files.
Thanks for the advice. I will try to learn how to run from rose. The jules compiled in my directory is using Intel, so I always used sci1 or sci5.
I now get a similar or the same error as you got: forrtl: severe (408): fort: (3): Subscript #1 of the array ES has value -2147483648 which is less than the lower bound of 0
Maybe you can try to compile JULES with debug turned on? How do you compile JULES? There should be an option to select compiling with JULES_BUILD in debug mode instead of normal mode.
Then run your program and see what new error messages you get.
Patrick
I will try that too. And I found that all the variables in the dump files are missing although I have provided them. I wonder if the problem is related to it. Do you know what can cause this kind of problem?
It looks as though the error is occurring on trying to access an array called ES. The only places where JULES does this are in the subroutines in src/science/surface/qsat_mod.F90, which calculate saturation mixing ratio from temperature and pressure. Looking at the source code, the only way I can see in which such an error could occur is if a temperature had the invalid value NaN (Not a Number). Unfortunately, the subroutines are called from several places, and it’s not possible to say exactly where things are going wrong without more information.
I’d try two things: 1) Check your input files for invalid values (particularly non-standard files and particularly temperatures). 2) Investigate compiler options to get more information on where the error is occurring (I can’t remember offhand what the relevant ifort options are, but there must be some).
Now I have more information of the error message using debug JULES:
Caught signal 8 (Floating point exception: floating-point divide by zero)
/home/users/byxu/test/jules-7.4/preprocess/src/jules/src/science/surface/jules_gridinit_sf_explicit_jls.F90: [ L_jules_gridinit_sf_explicit_mod_mp_jules_gridinit_sf_explicit__197__par_region0_2_0() ]
...
199 DO j = tdims%j_start,tdims%j_end
200 DO i = tdims%i_start,tdims%i_end
201 rhostar(i,j) = pstar(i,j) / ( r * tstar(i,j) )
==> 202 ! ... first approximation to surface air density from ideal gas equation
203 ftl_1(i,j) = 0.0
204 fqw_1(i,j) = 0.0
205 q1_sd(i,j) = 0.0
However, I have checked the surface temperature, there is no zero or missing value in it. I also change the driving data to see if it is because of the driving data, which is also in /home/users/pmcguire/grid_run , it still has the same error.
Hi Beiyao:
You might need to put some print statements in your jules FORTRAN code, around the problem lines of code. Or maybe you can figure out how to use a debugger with the code?
I have had some problems in the past with very small amounts of frost forming on the ground, which was caused by not having the driving data calculated properly, or something like that. This might be related to the snow package, which you are looking at right now.
Patrick
After changing all the settings to simple settings including initial conditions (which might be the reason why model can’t run), the model can run without bugs now. It might still need some adjustments to run and satisfy all my requirements, but it is a good sign.