Grouped Collapses in cf-python

Dear CMS Helpdesk,

I’m attempting to use cf-python to evaluate monthly means for hourly climate diagnositcs. The diagnostics are themselves monthly means, output for particular hours in the day (00:00, 03:00, 06:00, etc) using the time domain TMONMNXY where XY is the hour-code. So for every year there are 96 time slices, 8 for each month.

I am attempting to collapse 80 years worth of these diagnostics into a single set of monthly means where each time represents the average of that time over all 80 years.

I’ve tried using the following to perform this operation:

field.collapse("T: mean within years T: mean over years",within_years = cf.month(mon) & cf.hour(hrs[idx]) & cf.minute(mins[idx]))

This sits in a loop that iterates over integers for each month and then over integers for each combination of the hours and minutes that form the hourly time slices.

The code appears to work for the first iteration (January means at 00:00), where:

field.collapse("T: mean within years T: mean over years",within_years = cf.month(1) & cf.hour(0) & cf.minute(0))

but for the next iteration (selecting the January means at 03:00), it fails and produces the following error:

field.collapse("T: mean within years T: mean over years",within_years = cf.month(1) & cf.hour(1) & cf.minute(30))
ValueError: Can't collapse: No contiguous groups within years were identified

Can you explain what’s going wrong?
Also, if this is a question that should be posted elsewhere, please do let me know.

Regards,
Alfred

Hi Alfred,

Thanks for providing all the detail on this.

Firstly, there may be cases where the groups are not contiguous (e.g. where there are missing required time data or overlapping time bounds) in which case you will need to set group_contiguous=0 in your call to prevent failures from that. That may immediately work. If not, if I understand your desired result correctly, the issue could be the format in which you are specifying your within_years groups. You have given these as a query which evaluates to, for example in the failing case you state:

>>> cf.month(1) & cf.hour(1) & cf.minute(30)
<CF Query: [[month(eq 1) & hour(eq 1)] & minute(eq 30)]>

but what we actually need to specify for a monthly mean is to specify a time duration of a month for a specific start time e.g. your hour at 1 and minute at 30, for example, like so cf.M(1, hour=1, minute=30).

So, if you try this (in combination with the above parameter setting, to check), it may then work:

f.collapse("T: mean within years", within_years=cf.M(mon, hour=hrs[idx], minute=mins[idx]), group_contiguous=0)

Please let me know if either of those are helpful and if they aren’t we can investigate further.

Thanks,

Sadie

Hi Sadie,

Thanks for the advice, I tried what you suggested but unfortunately got the following error for time slices at January 03:00:

ValueError: Can't collapse: No groups within years spanning P1M (Y-M-01 01:30:00) were identified

Luckily I found that I can use indexing to subset the field for the correct time slices in each iteration of the loop:

for sample in range(nsamples):
     slice = np.arange(sample,field.shape[0],nsamples)
     mon_mns.append(field[slice,:].collapse("T: mean",coordinate="minimum"))

nsamples is the number of samples in one year (96) and slice is a list of all indices in the time domain that correspond to the same monthly and hourly slice.
The subset field is then meaned using cf.collapse and the result appended to a list (mon_mns).
Finally, the nsamples meaned fields in mon_mns are combined with cf.aggregate into a single construct to be written to netcdf.

This approach actually works out much faster than trying to use "T: mean within years T: mean over years"

Thanks again for your help!

Regards,
Alfred

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.