Get SIF canopy layer code into JULES trunk

Hi Tristan:
There were/are a couple of months of missing WFDEI rainfall data in 2009 and 2011 in:
/gws/nopw/j04/jules/data/WFD-EI-Forcing/Rainf_WFDEI_GPCC_land/
So I have requested that those files be found. In the meantime, I am only running the canopy layer code from 2012-2016 instead of 2009-2016.
Patrick

Hi Tristan:
I have the JULES5.0 and JULES7.0 canopy-layer branches running interactively now on JASMIN, for the whole globe. They seem somewhat to run somewhat slowly. So far, for example, in 1 hour of spinning up, the JULES7.0 canopy-layer branch only gets through about 30 days of the two-year-long spinup. Maybe this is good enough for layered-canopy output. But then again, maybe not.

So, I am trying to run the JULES5.0 trunk version in addition to the JULES5.0 canopy-layer branch version. That way, I can see if the former is any faster than the latter.

To do this, it’s similar to before, but I am now using the rose-cylc suite ~pmcguire/roses/u-al752currentworking11nl to compile it instead with JULES5.0 trunk.

And I am running the no-layer code with

cd /gws/nopw/j04/odanceo/pmcguire/jules_sif/global_v5nl
./load_modules_and_run.sh > run_v5_nl.log 2> run_v5f_nl.err &

I have stripped out the three layered-canopy variables from output.nml, leaving only these 3 variables:

output_type=3*'S'
profile_name='sif_vars',
var='gpp', 'lai', 'frac'

So far, in limited testing, the JULES5.0 trunk version seems several times faster (measured in spinup-days per wallclock-hour) than the canopy-layer branch version of JULES5.0. Maybe this is as expected?

The canopy-layer branch versions of JULES5.0 and JULES7.0 both seem to use about 1.5GB of RES memory, according to top, whereas the trunk version of JULES5.0 uses about 0.83GB of RES memory. Again, maybe this is as expected?
Patrick

Hi Tristan:
The global SIF canopy layer code ran over night on JASMIN, interactively, for both JULES5.0 and JULES7.0, and they both are only about half-way through the two-year spinup. The JULES5.0 trunk version of the code ran through the two years of spinup in that time.

I shouldn’t run code interactively for that long on JASMIN. I have stopped these runs. For a first step, I will change the setup to only run for a smaller geographical region. That way it will run faster. Later, for a second step, I can set it up to run in batch mode instead of interactively, but instead also with multiple processors by domain decomposition.
Patrick

Hi Patrick,

Just a quick response

2012 onward is fine for me, so if it stays like that, then no problem.

I am surprised that the runs are slower though. There is no additional processing right? As in we are just using variables that are already being calculated by the model. Is this an IO thing?

Thanks,
Tristan.

Hi Tristan:
I don’t think it is an I/O thing, since nothing is being written to disk until the end of the spinup year.
There are probably some extra loops of processing somewhere.
Patrick

Hi Tristan:
In order to speed things up, I just got the JULES5.0 canopy-layered version running over the region around the UK, instead of the whole globe:
lat_bounds= 45 65
lon_bounds= -10 5

To do this, I also had to:

  1. set land_only to .true.,
  2. spin up from idealized initial conditions instead of from a dump file,
  3. turn off l_phenol and l_top.

We can try to turn on l_phenol and l_top later somehow. The dump file was a global one and it was for 1980 anyways, instead of 2012.
Patrick

Hi Tristan:
The JULES5.0 canopy-layered branch does run somewhat faster through 2 years of spinup when running over just the UK region. It does slow down somewhat for the main run as compared to spinup, I guess because of the writing out of the canopy-layered variables. I will try to run it with writing on the scratch-pw instead of the odanceo nopw GWS. Maybe that will help.

For the UK region, the JULES7.0 canopy-layered branch crashes with a run-time error during the main run, which I need to figure out, with the number of layers set to -1 instead of >1.
Patrick

Hi Tristan:

With the smaller and faster spatial domain (UK region only), I ran the JULES5.0 layered-canopy code with output written to the odanceo nopw, and it produced the 3 variables of canopy-layered output at a rate of about 7 wallclock hours per year.

I did the same, but with output written to /work/scratch-pw2 and it produced the 3 variables of canopy-layered output at a rate of about 0.38 wallclock hours per year. This is quite a speed up! It is a speed up of about a factor of 18.

I will in the future be running with output instead to /work/scratch-pw2.

Patrick

Hi again Tristan:

With the smaller and faster spatial domain (UK region only), I can now look at the main-run output. When I look at the layered-canopy output (i.e., gpp_lyr) for the JULES5.0 branch version of the code, it is all zeros. So I will be working on figuring out why there aren’t the expected non-zero values.

Patrick

Hi again2, Tristan:
This is just to note, that I fixed the problem with the JULES7.0 branch version of the canopy-layer code. It doesn’t crash anymore with a run-time error in the main run, due to the number of layers being set to -1 instead of >1. The new FORTRAN code has been checked in to MOSRS in the vn7.0_sif1 branch.
Patrick

Hi again3, Tristan:
I may have figured out why the gpp_lyr currently has output that is zero everywhere.
The namelists have can_rad_mod set to be 4, but the FORTRAN code only computes gpp_lyr if can_rad_mod is 5 or 6. I set can_rad_mod to 5. It seems to be producing non-zero output for gpp_lyr, but it also might be somewhat slower.
Patrick

My reporting of the slower speeds with can_rad_mod=5 might have been premature. It also seems to produce non-zero gpp_lyr output now with the JULES7.0 canopy-layer branch, when can_rad_mod=5 instead of can_rad_mod=4.
Patrick

The JULES5.0 canopy-layer branch can run for the entire UK region, producing non-zero gpp_lyr output for the whole UK region with can_rad_mod=5.

However, when I try to run the JULES7.0 canopy-layer branch for the entire UK region with can_rad_mod=5, it has a run-time error, and it crashes in the physiol( ) module sometimes after only 1 timestep. I can do various things, to get it not to crash, or to delay the crash. Some of these things include:

  1. using only a subset of the UK region;
  2. use can_rad_mod=6;
  3. starting the spinup (from idealized conditions) on September 1 instead of January 1.

The can_rad_mod=4,5,6 suggests using a non-constant diffusive fraction for the driving data. But the WFDEI driving dataset that we’re using doesn’t seem to provide non-constant diffusive fraction data. I’m not sure if this is what’s causing the problems or not.
Patrick

Hi Patrick,

I may be wrong, but my understanding was that JULES should always be used on a parallel write file system (for precisely the reason you have found). I do have some GWS with pw, which is specifically for this project if that helps.

What configuration are you using? can_rad_mod==4 is basically no in any current configuration so we shouldn’t be using it.

We should, if I recall correctly, only be using can_rad_mod==6 and higher. In principle the layered output should be able to work with all can_rad_mods, but (assuming you’ve basically just re-implemented my code) I may not have done it for all versions. I cannot remember now.

I think we should decide either (a) make it only work with a subset of can_rad_mods and throw an error if the user selects a different option or (b) make it work for all versions. I suspect (b) might be easier.

However - if my memory is correct - can_rad_mod==1 (and maybe 2) only have a single layer anyway, and it’s not worth changing the retrospectively.

Hi Tristan:
Thanks!

I was using the configuration that you pointed me to (see above: Get SIF canopy layer code into JULES trunk - #13 by tquaife), which is located at /gws/nopw/j04/odanceo/tquaife/jules_sif/global, which has can_rad_mod=4. Maybe this is the same configuration (more or less) that I need if I want to use can_rad_mod= 5 or 6 (just changing that setting is all that is needed)?

When running JULES with multiple processors, it is certainly not allowed to use the a nopw GWS for output. But I understood that using a nopw GWS for output with JULES does work for single-processor runs. I didn’t realize that it can slow down this much, though, relative to a pw GWS or a pw scratch, for single processor runs.

It looks like for your JULES5.0 branch, you only implemented the layered output for can_rad_mod = 5 or 6. I will certainly try to figure out how to handle things for the other values of can_rad_mod <=4, but it may not be the most urgent thing to do right now. I do like both of your options (a) and (b). I am not aware right now of can_rad_mod settings higher than 6.

Patrick

Hi Patrick,

In the conversation we had originally we said you would use that configuration to get started, but that we need to move to an official configuration as soon as possible. (In fact, if I recall correctly, I suggested we should have been using whatever the most recent GL is from the outset).

I suggest that switching over to an official configuration is fairly high priority.

Cheers,
Tristan.

Hi Tristan:
There is a standard JULES suite for Global Land 7 (GL7) that I am somewhat familiar with. This is u-bb316. This uses can_rad_mod=4. So it isn’t optimized I guess for can_rad_mod=6, if any optimizations are needed.

I am not personally aware of how far GL8 or GL9 have gotten. But if I look up the suites listed at: https://code.metoffice.gov.uk/trac/jules/wiki/JulesConfigurations ,
I see u-bb543 for GL7.2 and u-bu288 for GL9.0.
I looked up both of these suites, and they both have can_rad_mod=4.

So, we can either stick with can_rad_mod=4, or we can upgrade one of the GL suites to can_rad_mod=5 or 6.
Patrick

I also do not know what the most recent “official” GL release is, but I can ask.

I suggest we find that out, update to it, then do two short runs - one with can_rad_mod=4 and one with can_rad_mod=5 and test the difference. The test needn’t be over a large area or for a very long time.

I am somewhat surprised that they are still using 4. The difference is sunfleck penetration and including that is very basic (and is in the Clark et al. 2011 paper - so it’s been in the codebase for a long time).