JULES running problem: forrtl: severe (408): fort: (3)

Hi everyone,

I am running JULES7.4 model and it can’t run after I changed the setting of the namelists including ancillary files, drive files, initial files and some other settings to create a global running on 0.5 degrees.

I am running using namelist command line so I could not provide the suite number. I appreciate it if anyone could give me some advice about where should I check out for the error.

The message of the error is:
forrtl: severe (408): fort: (3): Subscript #1 of the array ES has value -2147483648 which is less than the lower bound of 0

I am trying to change the initial files based on the successful simulations, but it did not work. I remember the last time I met this problem using N96 grid, and solved it by changing l_trait_phys from .true. to .false., but now it is already set to .false…

The full error message is like:

[WARNING] CHECK_NAMELIST_VALUES: Provided variable 'clay_const_z' is not required, so will be ignored
[WARNING] init_ic: Provided variable 'croplai' is not required, so will be ignored
[WARNING] init_ic: Provided variable 'gs' is not required, so will be ignored
[WARNING] init_ic: Provided variable 'tsoil_deep' is not required, so will be ignored
[WARNING] init_ic: Provided variable 'rfm_surfstore_rp' is not required, so will be ignored
[WARNING] init_ic: Provided variable 'rfm_substore_rp' is not required, so will be ignored
[WARNING] init_ic: Provided variable 'rfm_flowin_rp' is not required, so will be ignored
[WARNING] init_ic: Provided variable 'rfm_bflowin_rp' is not required, so will be ignored
forrtl: severe (408): fort: (3): Subscript #1 of the array ES has value -2147483648 which is less than the lower bound of 0

Image              PC                Routine            Line        Source             
jules.exe          000000000104D38F  for_emit_diagnost     Unknown  Unknown
jules.exe          0000000000BFB359  Unknown               Unknown  Unknown
jules.exe          0000000000C22343  Unknown               Unknown  Unknown
libiomp5.so        00007FA7897D6713  __kmp_invoke_micr     Unknown  Unknown
libiomp5.so        00007FA7897656A3  __kmp_fork_call       Unknown  Unknown
libiomp5.so        00007FA78972C63D  __kmpc_fork_call      Unknown  Unknown
jules.exe          0000000000C0B842  Unknown               Unknown  Unknown
jules.exe          0000000000A36A75  Unknown               Unknown  Unknown
jules.exe          000000000083C0E7  Unknown               Unknown  Unknown
jules.exe          000000000041BE83  Unknown               Unknown  Unknown
jules.exe          000000000040E946  Unknown               Unknown  Unknown
jules.exe          000000000040E422  Unknown               Unknown  Unknown
libc-2.17.so       00007FA7890EC555  __libc_start_main     Unknown  Unknown
jules.exe          000000000040E329  Unknown               Unknown  Unknown
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[9079,1],0]
  Exit code:    152
--------------------------------------------------------------------------

Part of the log is like:

[INFO] init_ic: 'cropdvi' will be set to a constant = -2.000000

[INFO] init_ic: 'croprootc' will be set to a constant = 9.9999997E-05

[INFO] init_ic: 'cropharvc' will be set to a constant = 0.0000000E+00

[INFO] init_ic: 'cropreservec' will be set to a constant = 0.0000000E+00

[INFO] init_ic: 'cropcanht' will be set to a constant = 1.0000000E-03

[INFO] init_ic: 'sthzw' will be set to a constant = 0.0000000E+00

[INFO] init_ic: 'zw' will be set to a constant = 0.0000000E+00

[INFO] init_ic: 'canopy' will be set to a constant = 0.0000000E+00

[INFO] init_ic: 'tstar_tile' will be set to a constant = 0.0000000E+00

[INFO] init_ic: 'cs' will be set to a constant = 0.0000000E+00

[INFO] init_ic: 'rgrain' will be set to a constant = 0.0000000E+00

[INFO] init_ic: 'sthuf' will be set to a constant = 0.0000000E+00

[INFO] init_ic: 't_soil' will be set to a constant = 0.0000000E+00

[INFO] init_ic: 'snow_tile' will be set to a constant = 0.0000000E+00

[INFO] init_ic: 'lai' will be set to a constant = 4.000000

[INFO] init_ic: 'sthu_irr' will be set to a constant = 0.5000000

[INFO] init_ic: Number of soil points = 64583

[INFO] init_ic: Number of land ice points = 3064

[INFO] init_output: Reading JULES_OUTPUT namelist...

[INFO] init_output: Reading JULES_OUTPUT_PROFILE namelist...

[INFO] register_output_profile: Profile with name others registered to provide output for main run from 2012-01-01 00:00:00 to 2013-01-01 00:00:00

[INFO] init_output: Reading JULES_OUTPUT_PROFILE namelist...

[INFO] register_output_profile: Profile with name crop registered to provide output for main run from 2012-01-01 00:00:00 to 2013-01-01 00:00:00

[INFO] init_vars_tmp: Setting satcon to zero at land ice points
......
......
......
[INFO] file_ncdf_close: Closing file /work/scratch-pw3/byxu/output//global.dump.20120101.0.nc

[INFO] init: Initialisation is complete

Hi Beiyao
Thanks for your ticket. Have you looked at your job.err and job.out log files? Do you see anything else in these log files that might help?
Patrick

Hi Patrick,

Thanks for your reply. I pasted all the content of the job.err before, and as for the job.err, everything seems normal. It failed after completing the initialisation, so I can’t find the reason why it had the error.

Beiyao

Hi Beiyao:
What is your suite number? I don’t see that here, but maybe I missed it. Have you checked in all your changes in the suite to MOSRS?
Can you give read permission to your home directory? (You might want to put confidential items someplace else without the read permission.) You can use the following command:
chmod -R g+rX /home/users/byxu
Patrick

Hi again Beiyao:
If you don’t have a suite that you’re using, can you also tell me how you run your program and from which directory?
Patrick

Hi Patrick,

Thanks for your reply.

I am using namelist from ‘/home/users/byxu/grid_run’, you can run it by ". jules_job.sh ",

If there is any problem, please let me know.

Best,
Beiyao

Thanks, Beiyao
I have copied your code to my directory /home/users/pmcguire/grid_run.

And when I try to run it with . jules_job.sh from sci3, I get this error message:
Please verify that both the operating system and the processor support Intel(R) X87, CMOV, MMX, FXSAVE, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, POPCNT, F16C and AVX instructions.

So I am now trying to run it with sbatch jules_jb.sh from sci3. It’s waiting in the par-multi queue now.

Once it starts to run, how long does it take to produce the error message that you noted? (forrtl: severe (408): fort: (3): Subscript #1 of the array ES has value -2147483648 which is less than the lower bound of 0)

I looked at your two sets of err files jules.34718403.err and jules.34720729.err from Friday night and Saturday morning, and I don’t see the above error message there. Did you change something?

Why aren’t you using a rose/cylc suite for this? It would be much easier to support you if you do use suites. And normally, in suites, we don’t use precompiled executables like you are doing here. Using a precompiled executable might be part of the reason I got the error message above about ...support Intel(R) X87.

Which machine do you run your jules SLURM script on? Is it cylc1 or sci3?
Patrick

Hi Patrick,

Thanks for the update.

Once it starts to run, it only takes 1 minute to produce the error message. The err files jules.34718403.err and jules.34720729.err are generated when the group work space was not working, so the model stuck in reading files.

Thanks for the advice. I will try to learn how to run from rose. The jules compiled in my directory is using Intel, so I always used sci1 or sci5.

Beiyao

I now get a similar or the same error as you got:
forrtl: severe (408): fort: (3): Subscript #1 of the array ES has value -2147483648 which is less than the lower bound of 0

Maybe you can try to compile JULES with debug turned on? How do you compile JULES? There should be an option to select compiling with JULES_BUILD in debug mode instead of normal mode.
Then run your program and see what new error messages you get.
Patrick

Hi Patrick,

Thanks for the update.

I will try that too. And I found that all the variables in the dump files are missing although I have provided them. I wonder if the problem is related to it. Do you know what can cause this kind of problem?

Beiyao

(from David Livings)
Hi Beiyao

It looks as though the error is occurring on trying to access an array called ES. The only places where JULES does this are in the subroutines in src/science/surface/qsat_mod.F90, which calculate saturation mixing ratio from temperature and pressure. Looking at the source code, the only way I can see in which such an error could occur is if a temperature had the invalid value NaN (Not a Number). Unfortunately, the subroutines are called from several places, and it’s not possible to say exactly where things are going wrong without more information.

I’d try two things: 1) Check your input files for invalid values (particularly non-standard files and particularly temperatures). 2) Investigate compiler options to get more information on where the error is occurring (I can’t remember offhand what the relevant ifort options are, but there must be some).

Good luck
David

Hi David,

Thanks for your advice.

I have check the weather data and there is no missing value. I will compile the model for debug to see if it can produce more information.

Thanks,
Beiyao

Hi Beiyao:
After David wrote, I figured out a way to find the ES array in the JULES code:
grep -ir ' ES(' ~byxu/jules/src/*

/home/users/byxu/jules/src/science/surface/qsat_data_mod.F90:REAL(KIND=real_jlslsm) :: es(0:n+1)
/home/users/byxu/jules/src/science/surface/qsat_mod.F90:  qs(i)  = (one - atable) * es(itable) + atable * es(itable+1)
/home/users/byxu/jules/src/science/surface/qsat_mod.F90:  qs(i)  = (one - atable) * es(itable) + atable * es(itable+1)
/home/users/byxu/jules/src/science/surface/qsat_mod.F90:  qs(i)  = (one - atable) * es(itable) + atable * es(itable+1)
/home/users/byxu/jules/src/science/surface/qsat_mod.F90:  qs(i)  = (one - atable) * es(itable) + atable * es(itable+1)

Probably there is a better way to do this.
Patrick

The way I used was grep -iwR es from the src directory.

Cool, David!
I never used the grep -w before. I remember Rich Ellis talking about grep -i previously. The grep -w is very useful!
PCM

Hi Patrick,

Thanks!

Now I have more information of the error message using debug JULES:

Caught signal 8 (Floating point exception: floating-point divide by zero)

/home/users/byxu/test/jules-7.4/preprocess/src/jules/src/science/surface/jules_gridinit_sf_explicit_jls.F90: [ L_jules_gridinit_sf_explicit_mod_mp_jules_gridinit_sf_explicit__197__par_region0_2_0() ]
      ...
      199 DO j = tdims%j_start,tdims%j_end
      200   DO i = tdims%i_start,tdims%i_end
      201     rhostar(i,j) = pstar(i,j) / ( r * tstar(i,j) )
==>   202     ! ... first approximation to surface air density from ideal gas equation
      203     ftl_1(i,j) = 0.0
      204     fqw_1(i,j) = 0.0
      205     q1_sd(i,j) = 0.0

However, I have checked the surface temperature, there is no zero or missing value in it. I also change the driving data to see if it is because of the driving data, which is also in /home/users/pmcguire/grid_run , it still has the same error.

Beiyao

After changing the land fraction to a new version, the error message changed to:

forrtl: error (75): floating point exception
Image              PC                Routine            Line        Source             
jules.exe          0000000001F022BB  Unknown               Unknown  Unknown
libpthread-2.17.s  00007F73CECD0630  Unknown               Unknown  Unknown
jules.exe          0000000001FBE34A  Unknown               Unknown  Unknown
jules.exe          00000000019C9A54  albsnow_ts_mod_mp         151  albsnow_ts_jls_mod.F90
libiomp5.so        00007F73CEFFF713  __kmp_invoke_micr     Unknown  Unknown
libiomp5.so        00007F73CEF8E6A3  __kmp_fork_call       Unknown  Unknown
libiomp5.so        00007F73CEF5563D  __kmpc_fork_call      Unknown  Unknown
jules.exe          00000000019C3A07  albsnow_ts_mod_mp         132  albsnow_ts_jls_mod.F90
jules.exe          000000000129207F  jules_land_albedo         783  jules_land_albedo_jls_mod.F90
jules.exe          0000000000D4D85F  surf_couple_radia         254  surf_couple_radiation_mod.F90
jules.exe          00000000004209DC  control_                  486  control.F90
jules.exe          000000000040EBC7  MAIN__                    199  jules.F90
jules.exe          000000000040E422  Unknown               Unknown  Unknown
libc-2.17.so       00007F73CE711555  __libc_start_main     Unknown  Unknown
jules.exe          000000000040E329  Unknown               Unknown  Unknown

It seems the error is related to the snow calculation.

Hi Beiyao:
You might need to put some print statements in your jules FORTRAN code, around the problem lines of code. Or maybe you can figure out how to use a debugger with the code?
I have had some problems in the past with very small amounts of frost forming on the ground, which was caused by not having the driving data calculated properly, or something like that. This might be related to the snow package, which you are looking at right now.
Patrick

Hi Patrick,

Thanks for all your patient reply.

After changing all the settings to simple settings including initial conditions (which might be the reason why model can’t run), the model can run without bugs now. It might still need some adjustments to run and satisfy all my requirements, but it is a good sign.

Thanks again!
Beiyao

Hi David,

Thanks for your suggestions, and the debug mode helped me a lot! Now the model can run without bugs now, but it still needs some adjustments.

Thanks!
Beiyao