`API`¶

preprocessing¶

cmip6_preprocessing.preprocessing.broadcast_lonlat(ds, verbose=True)[source]¶: Some models (all gr grid_labels) have 1D lon lat arrays This functions broadcasts those so lon/lat are always 2d arrays.

cmip6_preprocessing.preprocessing.cmip6_renaming_dict()[source]¶: a universal renaming dict. Keys correspond to source id (model name) and valuse are a dict of target name (key) and a list of variables that should be renamed into the target.

cmip6_preprocessing.preprocessing.combined_preprocessing(ds)[source]¶

cmip6_preprocessing.preprocessing.correct_coordinates(ds, verbose=False)[source]¶: converts wrongly assigned data_vars to coordinates

cmip6_preprocessing.preprocessing.correct_lon(ds)[source]¶: Wraps negative x and lon values around to have 0-360 lons. longitude names expected to be corrected with rename_cmip6

cmip6_preprocessing.preprocessing.correct_units(ds)[source]¶: Converts coordinates into SI units using pint-xarray

cmip6_preprocessing.preprocessing.fix_metadata(ds)[source]¶: Fix known issues (from errata) with the metadata.

cmip6_preprocessing.preprocessing.maybe_convert_bounds_to_vertex(ds)[source]¶: Converts renamed lon and lat bounds into verticies, by copying the values into the corners. Assumes a rectangular cell.

cmip6_preprocessing.preprocessing.maybe_convert_vertex_to_bounds(ds)[source]¶: Converts lon and lat verticies to bounds by averaging corner points on the appropriate cell face center.

cmip6_preprocessing.preprocessing.parse_lon_lat_bounds(ds)[source]¶: both regular 2d bounds and vertex bounds are parsed as *_bounds. This function renames them to *_verticies if the vertex dimension is found. Also removes time dimension from static bounds as found in e.g. SAM0-UNICON model.

cmip6_preprocessing.preprocessing.promote_empty_dims(ds)[source]¶: Convert empty dimensions to actual coordinates

cmip6_preprocessing.preprocessing.rename_cmip6(ds, rename_dict=None)[source]¶: Homogenizes cmip6 dataasets to common naming

cmip6_preprocessing.preprocessing.replace_x_y_nominal_lat_lon(ds)[source]¶: Approximate the dimensional values of x and y with mean lat and lon at the equator

cmip6_preprocessing.preprocessing.sort_vertex_order(ds)[source]¶: sorts the vertex dimension in a coherent order: 0: lower left 1: upper left 2: upper right 3: lower right

postprocessing¶

cmip6_preprocessing.postprocessing.combine_datasets(ds_dict, combine_func, combine_func_args=(), combine_func_kwargs={}, match_attrs=['source_id', 'grid_label', 'experiment_id', 'table_id', 'variant_label', 'variable_id'])[source]¶

General combination function to combine datasets within a dictionary according to their matching attributes. This function provides maximum flexibility, but can be somewhat complex to set up. The postprocessing module provided several convenience wrappers like merge_variables, concat_members, etc.

Parameters

ds_dict ([type]) – [description]
combine_func ([type]) – [description]
combine_func_args (tuple, optional) – [description], by default ()
combine_func_kwargs (dict, optional) – [description], by default {}
match_attrs ([type], optional) – [description], by default exact_attrs

Returns

[description]

Return type

[type]

cmip6_preprocessing.postprocessing.concat_experiments(ds_dict, concat_kwargs={})[source]¶

Given a dictionary of datasets, this function merges all available ensemble members (given in seperate datasets) into a single dataset for each combination of attributes, like source_id, grid_label, etc. but with concatnated members. CAUTION: If members do not have the same dimensions (e.g. longer run time for some members), this can result in poor dask performance (see: https://github.com/jbusecke/cmip6_preprocessing/issues/58) :param ds_dict: Dictionary of xarray datasets. :type ds_dict: dict :param concat_kwargs: Optional arguments passed to xr.concat. :type concat_kwargs: dict

Returns: A new dict of xr.Datasets with all datasets from ds_dict, but with concatenated members and adjusted keys.
Return type: dict

cmip6_preprocessing.postprocessing.concat_members(ds_dict, concat_kwargs={})[source]¶

Parameters

ds_dict (dict) – Dictionary of xarray datasets.
concat_kwargs (dict) – Optional arguments passed to xr.concat.

Returns

A new dict of xr.Datasets with all datasets from ds_dict, but with concatenated members and adjusted keys.

Return type

dict

cmip6_preprocessing.postprocessing.interpolate_grid_label(ds_dict, target_grid_label='gn', method='bilinear', xesmf_kwargs={}, merge_kwargs={}, verbose=False)[source]¶

Combines different grid labels via interpolation with xesmf

Parameters

ds_dict (dict) – dictonary of input datasets
target_grid_label (str, optional) – preferred grid_label value. If at least one dataset has this grid_label, otherse are interpolated to it. Dataset with this grid label are not modified, by default “gn”
method (str, optional) – interpolation method for xesmf, by default “bilinear”
xesmf_kwargs (dict, optional) – optional arguments for building xesmf regridder, by default {}
merge_kwargs (dict, optional) – optional arguments for the merging of interpolated datasets, by default {}
verbose (bool, optional) – print output while creating regridder, by default False

Returns

dictionary of combined datasets (usually will combine across different variable ids)

Return type

dict

cmip6_preprocessing.postprocessing.match_metrics(ds_dict, metric_dict, match_variables, match_attrs=['source_id', 'grid_label'], print_statistics=False, dim_length_conflict='error')[source]¶

Given two dictionaries of datasets, this function matches metrics from metric_dict to every datasets in ds_dict based on a comparison of the datasets attributes.

Parameters

ds_dict (dict) – Dictionary of xarray datasets, that need matching metrics.
metric_dict (dict) – Dictionary of xarray datasets, which contain metrics as data_variables.
match_variables (list) – Data variables of datasets in metric_dict to parse.
match_attrs (list, optional) – Minimum dataset attributes that need to match, by default [“source_id”, “grid_label”]
print_statistics (bool, optional) – Option to print statistics about matching, by default False
dim_length_conflict (str) – Defines the behavior when parsing metrics with non-exact matches in dimension size. See parse_metric.

Returns

All datasets from ds_dict, if match was not possible the input dataset is returned unchanged.

Return type

dict

cmip6_preprocessing.postprocessing.merge_variables(ds_dict, merge_kwargs={})[source]¶

Given a dictionary of datasets, this function merges all available data variables (given in seperate datasets) into a single dataset. CAUTION: This assumes that all variables are on the same staggered grid position. If you are working with data on the cell edges, this function will disregard that information. Use the grids module instead to get an accurate staggered grid representation.

Parameters

ds_dict (dict) – Dictionary of xarray datasets, that need matching metrics.
merge_kwargs (dict) – Optional arguments passed to xr.merge.

Returns

A new dict of xr.Datasets with all datasets from ds_dict, but with merged variables and adjusted keys.

Return type

dict

cmip6_preprocessing.postprocessing.pick_first_member(ddict)[source]¶

cmip6_preprocessing.postprocessing.requires_xesmf(func)[source]¶

grids¶

cmip6_preprocessing.grids.combine_staggered_grid(ds_base, other_ds=None, recalculate_metrics=False, grid_dict=None, **kwargs)[source]¶

Combine a reference datasets with a list of other datasets to a full xgcm-compatible staggered grid datasets.

Parameters

ds_base (xr.Dataset) – The reference (‘base’) datasets, assumed to be at the tracer position/cell center
other_ds (list,xr.Dataset, optional) – List of datasets representing different variables. Their grid position will be automatically detected relative to ds_base. Coordinates and attrs of these added datasets will be lost , by default None
recalculate_metrics (bool, optional) –
nables the reconstruction of grid metrics usign simple spherical geometry, by default False

!!! Check your results carefully when using reconstructed values, these might differe substantially if the grid geometry is complicated.
grid_dict (dict, optional) – Dictionary for staggered grid setup. See create_full_grid for detauls If None (default), will load staggered grid info from internal database, by default None

Returns

Single xgcm-compatible dataset, containing all variables on their respective staggered grid position.

Return type

xr.Dataset

cmip6_preprocessing.grids.create_full_grid(base_ds, grid_dict=None)[source]¶

Generate a full xgcm-compatible dataset from a reference datasets base_ds. This dataset should be representing a tracer fields, e.g. the cell center.

Parameters

base_ds (xr.Dataset) – The reference (‘base’) datasets, assumed to be at the tracer position/cell center
grid_dict (dict, optional) – Dictionary with info about the grid staggering. Must be encoded using the base_ds attrs (e.g. {‘model_name’:{‘axis_shift’:{‘X’:’left’,…}}}). If deactivated (default), will load from the internal database for CMIP6 models, by default None

Returns

xgcm compatible dataset

Return type

xr.Dataset

cmip6_preprocessing.grids.detect_shift(ds_base, ds, axis)[source]¶

Detects the shift of ds relative to ds on logical grid axes, using lon and lat positions.

Parameters

ds_base (xr.Dataset) – Reference (‘base’) dataset to compare to. Assumed that this is located at the ‘center’ coordinate.
ds (xr.Dataset) – Comparison dataset. The resulting shift will be computed as this dataset relative to ds_base
axis (str) – xgcm logical axis on which to detect the shift

Returns

Shift string output, in xgcm conventions.

Return type

str

cmip6_preprocessing.grids.distance(lon0, lat0, lon1, lat1)[source]¶

Calculate the distance in m between two points on a spherical globe

Parameters

lon0 (np.array) – Longitude of first point
lat0 (np.array) – Latitude of first point
lon1 (np.array) – Longitude of second point
lat1 (np.array) – Latitude of second point

cmip6_preprocessing.grids.distance_deg(lon0, lat0, lon1, lat1)[source]¶

Calculate the distance in degress longitude and latitude between two points

Parameters

lon0 (np.array) – Longitude of first point
lat0 (np.array) – Latitude of first point
lon1 (np.array) – Longitude of second point
lat1 (np.array) – Latitude of second point

cmip6_preprocessing.grids.recreate_metrics(ds, grid)[source]¶

Recreate a full set of horizontal distance metrics.

Calculates distances between points in lon/lat coordinates

The naming of the metrics is as follows: [metric_axis]_t : metric centered at tracer point [metric_axis]_gx : metric at the cell face on the x-axis.

For instance dx_gx is the x distance centered on the eastern cell face if the shift is right

[metric_axis]_gy : As above but along the y-axis [metric_axis]_gxgy : The metric located at the corner point.

For example dy_dxdy is the y distance on the south-west corner if both axes as shifted left.

Parameters

ds (xr.Dataset) – Input dataset.
grid (xgcm.Grid) – xgcm Grid object matching ds

Returns

Dataset with added metrics as coordinates and dictionary that can be passed to xgcm.Grid to recognize new metrics

Return type

xr.Dataset, dict

regionmask¶

cmip6_preprocessing.regionmask.merged_mask(basins, ds, lon_name='lon', lat_name='lat', merge_dict=None, verbose=False)[source]¶

Combine geographical basins (from regionmask) to larger ocean basins.

Parameters

basins (regionmask.core.regions.Regions object) – Loaded basin data from regionmask, e.g. import regionmask;basins = regionmask.defined_regions.natural_earth.ocean_basins_50
ds (xr.Dataset) – Input dataset on which to construct the mask
lon_name (str, optional) – Name of the longitude coordinate in ds, defaults to lon
lat_name (str, optional) – Name of the latitude coordinate in ds, defaults to lat
merge_dict (dict, optional) – dictionary defining new aggregated regions (as keys) and the regions to be merge into that region as as values (list of names). Defaults to large scale ocean basins defined by cmip6_preprocessing.regionmask.default_merge_dict
verbose (bool, optional) – Prints more output, e.g. the regions in basins that were not used in the merging step. Defaults to False.

Returns

mask – The mask contains ascending numeric value for each key ( merged region) in merge_dict. When the default is used the numeric values correspond to the following regions: * 0: North Atlantic

1: South Atlantic
2: North Pacific
3: South Pacific
4: Maritime Continent
5: Indian Ocean
6: Arctic Ocean
7: Southern Ocean
8: Black Sea
9: Mediterranean Sea

*10: Red Sea

*11: Caspian Sea

Return type

xr.DataArray

utils¶

cmip6_preprocessing.utils.cmip6_dataset_id(ds, sep='.', id_attrs=['activity_id', 'institution_id', 'source_id', 'experiment_id', 'variant_label', 'table_id', 'grid_label', 'version', 'variable_id'])[source]¶

Creates a unique string id for e.g. saving files to disk from CMIP6 output

Parameters

ds (xr.Dataset) – Input dataset
sep (str, optional) – String/Symbol to seperate fields in resulting string, by default “.”

Returns

Concatenated

Return type

str

cmip6_preprocessing.utils.google_cmip_col(catalog='main')[source]¶: A tiny utility function to point to the ‘official’ pangeo cmip6 cloud files.

cmip6_preprocessing.utils.model_id_match(match_list, id_tuple)[source]¶

Matches id_tuple to the list of tuples exception_list, which can contain wildcards (match any entry) and lists (match any entry that is in the list).

Parameters

match_list (list) – list of tuples with id strings corresponding to e.g. source_id, grid_label…
id_tuple (tuple) – single tuple with id strings.

API¶

preprocessing¶

postprocessing¶

grids¶

regionmask¶

utils¶

`API`¶