Cfplot 1D land-only data

Hi CMS Helpdesk:
I am trying to figure out how to plot 1D land-only data better. The program ncview doesn’t seem to handle it. I have written Python code in the past to do this, but now I am trying to see if cfplot can handle it. I can’t immediately make cfplot work with 1D land-only data. Can you help?
I also tried to get cfview work to do this on JASMIN. But I get an error: “Not enough data to make a contour plot”
See below for my cfplot attempt.
Thanks,
Patrick
This is what I have tried so far on JASMIN:

module load jaspy
python
import cf
import cfplot
zz=cf.read("/gws/nopw/j04/jules/data/WFD-EI-Forcing_copy2//Rainf_WFDEI_GPCC_land/Rainf_WFDEI_GPCC_land_199001.nc")
zz
#this is the output:
#[<CF Field: long_name=Mean rainfall rate over the 
#previous 3 hours(ncdim%tstep(248), ncdim%land(67209)) kg/m2s>,
#<CF Field: ncvar%lat(ncdim%land(67209)) grid box centre degrees_north>,
#<CF Field: ncvar%lon(ncdim%land(67209)) grid box centre degrees_east>,
#<CF Field: long_name=time since start of month(ncdim%tstep(248)) seconds since 1990-01-01 00:00:00>,
#<CF Field: long_name=time steps since start of month(ncdim%tstep(248)) time steps since 1990-01-01 00:00:00>]
cfplot.con(zz)
#that didn't work, giving this error:
#"cfp.con - cannot contour a field list"
zzz=zz[0]
cfplot.con(zzz)
#that didn't work, giving this warning:
#"Warning: find_data_names error  -
#cannot find a coordinate for domainaxis0 in the data"

Hi Patrick,

Please copy the Rainf_WFDEI_GPCC_land_199001.nc file to your home directory as I cannot see the jules workspace. I’ll then have a look and see if I can get cf-plot to plot the data.

Cheers
Andy

Thanks, Andy:
I just copied it to my home directory:
~pmcguire/Rainf_WFDEI_GPCC_land/Rainf_WFDEI_GPCC_land_199001.nc
Patrick

Hi Patrick,

The data is saved as separate netCDF variables rather than a precipitation field with lon, lat, time dimensions. This is why cf-python has five fields rather than the one that we we normally get. We just want to plot it though so let’s extract the data and do that. First we have to correct, or reset, the units as they are ones that udunits doesn’t recognise and cf-python will throw an error and not read the data values.

f = cf.read(‘Rainf_WFDEI_GPCC_land_199001.nc’)
f
[<CF Field: long_name=Mean rainfall rate over the
previous 3 hours(ncdim%tstep(248), ncdim%land(67209)) kg/m2s>,
<CF Field: ncvar%lat(ncdim%land(67209)) grid box centre degrees_north>,
<CF Field: ncvar%lon(ncdim%land(67209)) grid box centre degrees_east>,
<CF Field: long_name=time since start of month(ncdim%tstep(248)) seconds since 1990-01-01 00:00:00>,
<CF Field: long_name=time steps since start of month(ncdim%tstep(248)) time steps since 1990-01-01 00:00:00>]

precip = f[0]
lats = f[1]
lons = f[2]

precip.override_units(None, inplace=True)
lats.override_units(None, inplace=True)
lons.override_units(None, inplace=True)

precip_data = precip.array
lats_data = lats.array
lons_data = lons.array

import numpy as np
np.shape(precip_data), np.shape(lons_data), np.shape(lats_data)
((248, 67209), (67209,), (67209,))

To make a contour plot we need to follow the method as used for nemo or station data at Unstructured grids — cf-plot 3.2.7 documentation

You’ll need to do this on sci3 or another server with another one with larger memory as a numpy tries to allocate a 16.8GB data array for manipulating the data.

cfp.con(f=precip_data[0,:], x=lons_data, y=lats_data, ugrid=True, ptype=1)

Or with more contour levels and the contour lines turned off:
cfp.con(f=precip_data[0,:], x=lons_data, y=lats_data, ugrid=True, ptype=1, nlevs=250, lines=False)

Hi Andy:
Thanks for figuring this out for cfplot. I am glad it works.
When I have mentioned similar things in the CMS meeting in the past, I was saying things like: it would be great if there was something like ncview (which works great with 2D data) that also worked for 1D-land-only data, just reading in the file and plotting it dynamically.

I was told by CMS people, including DavidH and maybe even yourself, that with cfplot, this is easy, and maybe even it could be made easy with cfview. I was told that I just needed to ask the 1D-land-only data provider to make sure the 1D-land-only data had the right format. But I think that this last ask, is a bit of a high bar.

Furthermore, your solution doesn’t look any easier than my prior coding to write Python code to do this without cfplot. I am surprised it needs so much memory, but thanks for figuring that out too.
Patrick

Hi Patrick,

The whole purpose of writing CF compliant data is that you can share your data with other groups and they can then easily compare data and do analysis quickly. Tools such as cf-python, cf-plot and cf-view make this process simple and reduce they amount of code need to do such operations often down to one line.

Writing data in netCDF format in non standard ways means that everyone who looks at this data has to do a lot more thinking and work which is a real waste of programmer and scientist time.

If the data were written in a CF compliant manner then in four lines of code (including two import lines) we can read the data and make a contour plot. cf-python does regridding in just one line, time and zonal means in just one line. cf-view would / will also be able to make a plot by just loading the data and clicking the plot button.

We need to show users and groups the really great tools that are out there once they have their data in CF compliant format. There are real benefits for everyone concerned in getting over the CF hurdle!

Here’s some code to make a simple CF longitude - latitude dataset using Python.

Writing a netCDF file

import netCDF4 as nc
import numpy as np

ds = nc.Dataset(‘data.nc’, ‘w’, format=‘NETCDF4_CLASSIC’)

lon = ds.createDimension(‘lon’, 40)
lat = ds.createDimension(‘lat’, 60)

lons = ds.createVariable(‘lon’, ‘float’, (‘lon’,))
lons.standard_name=‘longitude’
lons.units = ‘degrees_east’

lats = ds.createVariable(‘lat’, ‘float’, (‘lat’,))
lats.standard_name = ‘latitude’
lats.units = ‘degrees_north’

u = ds.createVariable(‘u’, ‘float’, (‘lat’, ‘lon’,))
u.standard_name = ‘eastward_wind’
u.units = ‘m/s’

lons[:] = np.arange(30.0, 70.0, 1.0)
lats[:] = np.arange(-30.0, 30.0, 1.0)

u[:, :] = np.random.uniform(-20, 20, size=(60, 40))

ds.close()

I will write a Python program to convert the data in Rainf_WFDEI_GPCC_land_199001.nc to a CF compliant dataset. This will be useful for me for testing in cf-view.

Hi Andy:
Thanks for the feedback and for the example NetCDF-writing program.

The example program is for 2D data (data that is stored for all grid cells in the region), whereas the example data file I gave you is for 1D-land-only data (data that is stored only for the land grid-cells). The latter is rather common in JULES land-surface modelling, since it saves a lot of disk space.

The example 1D-land-only data file that I gave you is just one example provided by one provider. I can provide you with other examples generated by other people or other code. The other examples might have cf-compliant units. I am more concerned about handling 1D-land-only data than with handling cf-non-compliant units. The latter is certainly very important, but maybe easier to fix.

It would be great if cfplot and cfview could just display the 1D-land-only data in a geographic map with 1 line of code or with 1 click. Ncview can display 2D data with 1 click, but it doesn’t handle 1D-land-only data.

Take your time with this. I will get back to you if needed by the end of next week.
Patrick

Hi Patrick,

After a good chat with David on Friday afternoon I think I’m starting to catch up with what the data is and what is happening here. If I look at the longitudes and latitudes and see what the unique values are then:
longitudes: 720 values from -179.75 to 179.75 in a uniform step of 0.5 degrees
latitudes: 280 values from -55.75 to 83.75 in a uniform step of 0.5 degrees

720 * 280 = 201600 values if the grid were a regular 2D one
The original data has a size of 67209 values of longitudes and latitudes

So would I be correct in saying that the data was originally on a regular longitude / latitude grid and only the land points packed into the netCDF file? Presumably this was done to save disk space?

If this is the case then to make a longitude-latitude plot with masked oceans we need to have the X and Y sizes of the original grid and the locations of the land points in that grid as a numpy array.

Hi Andy
Yes, you are correct in your analysis. So, can this file be loaded in cfplot with a single command? And can it be loaded easily in cfview? Or does the file need to be changed?
Patrick

Hi Patrick,

There’s not enough information in this file to make it work in any easy way with cf-python/ cf-plot. David and I had a look at this this afternoon and think the best way to approach this is to rewrite the data as CF compliant lon-lat-time field. After a bit of trying I came up with the attached code.

The resulting dataset is 27MB and which is a good space saving on the original 65MB one. It can be read in by cf-python and plotting using cf-plot / cf-view. Other common Python packages should be able to cope with this modified dataset too.


import numpy as np
import netCDF4 as nc


# Read in the data
f = nc.Dataset('Rainf_WFDEI_GPCC_land_199001.nc', 'r')


# Allocate the data arrays 
precip_data = f.variables['Rainf'][:]
lons_data = f.variables['lon'][:]
lats_data = f.variables['lat'][:]
time_data = f.variables['time'][:]
#timestp_data = f.variables['timestp'][:] - this is not used


# Create a data array
data = np.zeros([248, 280, 720])
data[:] = 2e20


# Create regularly spaced lons and lats
lons_regular = np.arange(720) * 0.5 - 179.75
lats_regular = np.arange(280) * 0.5 - 55.75
lons_regular_list = list(lons_regular)
lats_regular_list = list(lats_regular)



# Find the data position in the data array and allocate the data 
for i in np.arange(len(lons_data)):
    xpos = lons_regular_list.index(lons_data[i])
    ypos = lats_regular_list.index(lats_data[i])
    data[:, ypos, xpos] = precip_data[:, i]
    
    
    

# Writing the data
import netCDF4 as nc
import numpy as np

ds = nc.Dataset('data5.nc', 'w', format='NETCDF4')

lon = ds.createDimension('lon', 720)
lons = ds.createVariable('lon', 'float', ('lon',))
lons.standard_name='longitude'
lons.units = 'degrees_east'

lat = ds.createDimension('lat', 280)
lats = ds.createVariable('lat', 'float', ('lat',))
lats.standard_name = 'latitude'
lats.units = 'degrees_north'

time = ds.createDimension('time', None)
times = ds.createVariable('time', 'float', ('time',))
times.standard_name = 'time'
times.units = 'seconds since 1990-01-01 00:00:00'
times.calendar = 'standard'

precip = ds.createVariable('precip', 'float', ('time', 'lat', 'lon',), compression='zlib')
precip.standard_name = 'precipitation_amount'
precip.units = 'kg m-2'
precip.cell_methods = 'time: sum (previous 3 hours)'
precip.missing_value = 2e20


lons[:] = lons_regular
lats[:] = lats_regular
times[:] = time_data
precip[:] = data

ds.close()

Hi Andy:
Thank you.
DavidH said a while back that all I needed to do was modify the 1D-landonly NETCDF file in some particular way which I don’t remember now, and then most NETCDF viewers (besides ncview) would be able to read it in without writing code. I have code similar to what you have above already. Is there a way to put that code into some sort of preexisting cfplot or cfview wrapper function so that I don’t have to remember where the code is every time I want to do a quick view of some 1D-landonly data? I would like to be able to load it in something like ncview and just go about clicking through the different 1D-landonly variables and timesteps to do quick explorations of the data. There are all sorts of different providers of 1D-landonly data. It will be difficult to ask everyone or every code set that makes such 1D-landonly files to do things much differently than what they are doing already. Maybe 1D-landonly data specifications need to be added to the cf-compliance list of specifications somewhere, more generally? I think there is a lot more space savings when the data files contain multiple variables instead of just one variable (in this case, rainfall).
Patrick

Hi Patrick,

You’ll need to check with DavidH about what he meant when he chatted with you and also about getting 1D land points into CF. The data isn’t currently in any way self describing so won’t be easy to get into a CF enabled packages such as cf-python / cf-plot / cf-view.

A cf-plot / cf-view wrapper function for 1D land only data would need discussion. Maybe if you, me Bryan and David are at the CMS meeting this Friday then we could have a couple of minutes after the meeting to discuss this?

Space savings - I don’t think this is correct. Some back of the envelope calculations for multiple variables:

Data as a single vector of 67209 points:
The precipitation field is 248 time steps long and the longs and lats add an extra 2 to this so the data should be of size
250 * 67209 * 4 bytes / 1024 / 1024 MB = 64MB
This is the size we see on disk.

Lets have 100 variables in the file:
(248 * 100 + 2) * 67209 * 4 bytes / 1024 / 1024 = 6258MB

So the saving is very small at higher field numbers.

Data written out as 2D data as I described above will be of size 100 * 27 = 2700MB

So there is no advantage space wise of using 1D vectors of land points.

Hi AndyH
Your calculation of 64MB for a 248-timestep 1D-land-only data set with lat lon auxiliary layers looks correct:
(248+2) * 67209 * 4 bytes / 1024 / 1024 MB = 64MB (or 6360MB for 100 variables)

But unless there is compression, the estimate of 27MB for 2D land+ocean gridded data over 720*280=205600 grid points seems low. I get:
248 * (720 * 280) * 4 bytes / 1024 / 1024 MB = 191MB (or 19100MB for 100 variables).

So I am guessing that your NETCDF-formatted data is the size on disk of 2D data that has been compressed by some other way than ignoring ocean points. NETCDF does offer such compression. The trouble is that all these other data providers or code sets that produce 1D-land-only data have already done (or they currently do) this compression in another way (ignoring ocean points).

So I am just requesting a way to quickly read in and display such 1D-land-only data on a 2D grid, and maybe even step through the time-slices of such data, like in ncview that can do this for 2D gridded data. I would like this so people don’t need to remember where the code is for doing this transformation. I am happy to talk with you and DavidH and Bryan and others about this.
Patrick

Hi Andy:
I found another example for you for 1D-land-only data. I put it in this directory on JASMIN:

~pmcguire/porcelain_rdg__pmcguire__jules_global_monthly_mean__u-bb422GLTS/

Patrick

Hi Patrick,

Thanks - that’s very useful and good to have a comparison.

The two datasets look to be the same grid but are written out differently. Both grids go from -179.75 to 179.75 in longitude and -55.75 to 83.75 in latitude and have 67209 points.

Euro44_bvv_nancil_CTL-BCJ_jules-vn5.2pmed_u-bb422GLTS_monmean_1979.nc:

<CF Field: long_name=Drainage from bottom (nshyd) soil layer(time(12), ncdim%y(1), ncdim%x(67209)) kg m-2 s-1>

Time is a nicely defined CF coordinate. X, Y, T are recognised as such.
Data loops over latitudes.

Rainf_WFDEI_GPCC_land_199001.nc:

Field, longitudes, latitudes and time are separate netCDF variables. X, Y, T are not recognised as such.

[<CF Field: long_name=Mean rainfall rate over the
previous 3 hours(ncdim%tstep(248), ncdim%land(67209)) kg/m2s>,
<CF Field: ncvar%lat(ncdim%land(67209)) grid box centre degrees_north>,
<CF Field: ncvar%lon(ncdim%land(67209)) grid box centre degrees_east>,
<CF Field: long_name=time since start of month(ncdim%tstep(248)) seconds since 1990-01-01 00:00:00>,
<CF Field: long_name=time steps since start of month(ncdim%tstep(248)) time steps since 1990-01-01 00:00:00>]

Data loops over longitudes.

Hi Andy:
Here are some more 1D-landonly NETCDF files that are output by JULES. Does your cf-python code that changes the 1D-landonly files to cf-compliant 1D-landonly datasets work on these files?
/work/scratch-pw2/pmcguire/u-cx323/JULES-GL7.0.vn5.2_sif1.CRUNCEPv7SLURM.S2.Monthly.1869.nc

/work/scratch-pw2/pmcguire/u-cx323/JULES-GL7.0.vn5.2_sif1.CRUNCEPv7SLURM.S2.sif_vars.1869.nc

/work/scratch-pw2/pmcguire/u-cx323/JULES-GL7.0.vn5.2_sif1.CRUNCEPv7SLURM.S2.Annual.1869.nc

(There are some older files than made today in that directory; you don’t need to use those.)
Patrick

Hi Patrick,

Thanks! It’s really useful to have different datasets to generalise the code I’m working on. The plots are starting to look good in cf-plot and cf-view.

Cheers
Andy

Hi Patrick,
The following code looks okay to me when I do:
python land.py file.nc
for the following files:
Euro44_bvv_nancil_CTL-BCJ_jules-vn5.2pmed_u-bb422GLTS_monmean_1979.nc
JULES-GL7.0.vn5.2_sif1.CRUNCEPv7SLURM.S2.Annual.1869.nc
JULES-GL7.0.vn5.2_sif1.CRUNCEPv7SLURM.S2.Monthly.1869.nc
JULES-GL7.0.vn5.2_sif1.CRUNCEPv7SLURM.S2.sif_vars.1869.nc

The output uses around 30% less disk space than the original file. The output is CF compliant I believe as per 8.2. Compression by Gathering.

cf-python will read in the data and make it into a normal 2D or 3D dataset. There is an issue with slowness in the reading that David will fix in cf-python 3.15.2. Iris doesn’t understand this CF data but then we all know iris isn’t as rigorous in it’s interpretation of CF as cf-python is.

Happy to give you a demo on my PC and chat some more in person if needed.

import netCDF4 as nc
import numpy as np
import sys


# Initiate the pvars class
# This is used for storing netCDF variables
class pvars(object):
    '''The pvars object is used for storing plotting variables in plotvars'''
    def __init__(self, **kwargs):
        # Initialize a new Pvars instance
        for attr, value in kwargs.items():
            setattr(self, attr, value)

    def __str__(self):
        '''x.__str__() <==> str(x)'''
        a = None
        v = None
        out = ['%s = %s' % (a, repr(v))]
        for a, v in self.__dict__.items():
            return '\n'.join(out)

ncvars = pvars()


inputds = nc.Dataset(sys.argv[1], 'r')


# Allocate the longitude and latitude arrays
lons_data = inputds.variables['longitude'][:].squeeze()
lats_data = inputds.variables['latitude'][:].squeeze()


# Size of the data arrays
lons_unique = np.unique(lons_data)
lats_unique = np.unique(lats_data)
lonsize = np.size(lons_unique)
latsize = np.size(lats_unique)
nlandpoints = np.size(lons_data)


# Create sets of regularly spaced lons and lats
lons_regular = np.arange(lonsize) * (lons_unique[1] - lons_unique[0]) + np.min(lons_unique)
lats_regular = np.arange(latsize) * (lats_unique[1] - lats_unique[0]) + np.min(lats_unique)
lons_regular_list = list(lons_regular)
lats_regular_list = list(lats_regular)


# Create a land-sea mask array
lsmask = np.zeros([latsize, lonsize], dtype=int)

# Find the data position in the mask array and allocate the land points
for i in np.arange(len(lons_data)):
    xpos = lons_regular_list.index(lons_data[i])
    ypos = lats_regular_list.index(lats_data[i])
    lsmask[ypos, xpos] = 1

# Flatten the mask to a vector of points and 
# find the points where the mask is non zero
lsmask = lsmask.flatten()
landpts = lsmask.nonzero()



# Edit vars list to leave only the list of fields
vars = list(inputds.variables)
rmvars = ['time', 'time_bnds', 'time_bounds', 'longitude', 'latitude', 'time_bounds']
for rmvar in rmvars:
    try:
        vars.remove(rmvar)
    except:
        pass
    

# Calculate which dimensionsless dims need to be added
dimensionless_dims = []
for var in vars:
    vardims = inputds.variables[var].dimensions
    for vardim in vardims:
        dimensionless_dims.append(vardim)
        
dimensionless_dims = list(np.unique(dimensionless_dims))


rmdims = ['time', 'bnds', 'x', 'y']
for rmdim in rmdims:
    try:
        dimensionless_dims.remove(rmdim)
    except:
        pass


dimensionless_dims_dict = {}
for dimensionless_dim in dimensionless_dims:
    if dimensionless_dim not in dimensionless_dims_dict:
        for var in vars:
            mvars = inputds.variables[var].dimensions
            if dimensionless_dim in mvars:
                myind = mvars.index(dimensionless_dim)
                mysize = list(np.shape(inputds.variables[var]))[myind]
                dimensionless_dims_dict[dimensionless_dim] = mysize
            
    
    

time_data = inputds.variables['time'][:]
# Time must have just one dimension 
if np.ndim(time_data) > 1:
  time_data = time_data.squeeze()





# Writing the data
outputds = nc.Dataset('converted.nc', 'w', format='NETCDF4')

# Create the dimensions
lon = outputds.createDimension('lon', lonsize)
lat = outputds.createDimension('lat', latsize)
landpoint = outputds.createDimension('landpoint', nlandpoints)
time = outputds.createDimension('time', None)

# Create the dimensionless dims and variables and set the data
for ddim in dimensionless_dims_dict:
    setattr(ncvars, ddim, outputds.createDimension(ddim, dimensionless_dims_dict[ddim]))
    setattr(ncvars, ddim + 's', outputds.createVariable(ddim, 'i4', tuple([ddim])))
    getattr(ncvars, ddim + 's')[:] = np.arange(dimensionless_dims_dict[ddim])

# Create the variables
lons = outputds.createVariable('lon', 'f4', ('lon',))
lons.standard_name='longitude'
lons.units = 'degrees_east'

lats = outputds.createVariable('lat', 'f4', ('lat',))
lats.standard_name = 'latitude'
lats.units = 'degrees_north'

landpoint = outputds.createVariable('landpoint', 'i4', ('landpoint',))
landpoint.compress = "lat lon"


times = outputds.createVariable('time', 'f4', ('time',))
times.standard_name = inputds.variables['time'].standard_name
times.units = inputds.variables['time'].units
times.calendar = inputds.variables['time'].calendar

   
   

for i in np.arange(len(vars)):

    var = vars[i]
    
    print(var + ' ' + str(i + 1) + ' of ' + str(len(vars)))
    
    ndims = np.ndim(inputds.variables[var][:].squeeze())
    attrs = inputds.variables[var].ncattrs()

    # Form the _FillValue / fill_value dictionary as this must be set at variable creation time
    myfill ={}
    if '_FillValue' in attrs:
        myfill['fill_value'] = getattr(inputds.variables[var], '_FillValue')
        attrs.remove('_FillValue')

    # Create a tuple of the variable dimensions
    # The last two are 'y', 'x' which translates to 'landpoint'
    # Code assumes there are two or more dimensions
    vardims = list(inputds.variables[var].dimensions)[:-2]
    vardims.append('landpoint')
    vardims = tuple(vardims)

    # Create the variable
    setattr(ncvars, var, outputds.createVariable(var, 'f4', vardims, compression='zlib', **myfill))    

	
    for attr in attrs:
        setattr(getattr(ncvars, var), attr, getattr(inputds.variables[var], attr))
	    
	    
    # Add the data to the new variable
    # We remove the y dimension of the input field as that doesn't exist in the new data
    data = np.squeeze(inputds.variables[var][:], -2)
    getattr(ncvars, var)[:] = data


landpoint[:] = landpts
lons[:] = lons_regular
lats[:] = lats_regular
times[:] = time_data



# Close the converted netCDF file
outputds.close()


Hi Andy:
Excellent!
Do you have time for a demo/chat today?
Patrick