API¶
preprocessing¶
- cmip6_preprocessing.preprocessing.broadcast_lonlat(ds, verbose=True)[source]¶
Some models (all gr grid_labels) have 1D lon lat arrays This functions broadcasts those so lon/lat are always 2d arrays.
- cmip6_preprocessing.preprocessing.cmip6_renaming_dict()[source]¶
a universal renaming dict. Keys correspond to source id (model name) and valuse are a dict of target name (key) and a list of variables that should be renamed into the target.
- cmip6_preprocessing.preprocessing.correct_coordinates(ds, verbose=False)[source]¶
converts wrongly assigned data_vars to coordinates
- cmip6_preprocessing.preprocessing.correct_lon(ds)[source]¶
Wraps negative x and lon values around to have 0-360 lons. longitude names expected to be corrected with rename_cmip6
- cmip6_preprocessing.preprocessing.correct_units(ds)[source]¶
Converts coordinates into SI units using pint-xarray
- cmip6_preprocessing.preprocessing.fix_metadata(ds)[source]¶
Fix known issues (from errata) with the metadata.
- cmip6_preprocessing.preprocessing.maybe_convert_bounds_to_vertex(ds)[source]¶
Converts renamed lon and lat bounds into verticies, by copying the values into the corners. Assumes a rectangular cell.
- cmip6_preprocessing.preprocessing.maybe_convert_vertex_to_bounds(ds)[source]¶
Converts lon and lat verticies to bounds by averaging corner points on the appropriate cell face center.
- cmip6_preprocessing.preprocessing.parse_lon_lat_bounds(ds)[source]¶
both regular 2d bounds and vertex bounds are parsed as *_bounds. This function renames them to *_verticies if the vertex dimension is found. Also removes time dimension from static bounds as found in e.g. SAM0-UNICON model.
- cmip6_preprocessing.preprocessing.promote_empty_dims(ds)[source]¶
Convert empty dimensions to actual coordinates
- cmip6_preprocessing.preprocessing.rename_cmip6(ds, rename_dict=None)[source]¶
Homogenizes cmip6 dataasets to common naming
postprocessing¶
- cmip6_preprocessing.postprocessing.combine_datasets(ds_dict, combine_func, combine_func_args=(), combine_func_kwargs={}, match_attrs=['source_id', 'grid_label', 'experiment_id', 'table_id', 'variant_label', 'variable_id'])[source]¶
General combination function to combine datasets within a dictionary according to their matching attributes. This function provides maximum flexibility, but can be somewhat complex to set up. The postprocessing module provided several convenience wrappers like merge_variables, concat_members, etc.
- Parameters
ds_dict ([type]) – [description]
combine_func ([type]) – [description]
combine_func_args (tuple, optional) – [description], by default ()
combine_func_kwargs (dict, optional) – [description], by default {}
match_attrs ([type], optional) – [description], by default exact_attrs
- Returns
[description]
- Return type
[type]
- cmip6_preprocessing.postprocessing.concat_experiments(ds_dict, concat_kwargs={})[source]¶
Given a dictionary of datasets, this function merges all available ensemble members (given in seperate datasets) into a single dataset for each combination of attributes, like source_id, grid_label, etc. but with concatnated members. CAUTION: If members do not have the same dimensions (e.g. longer run time for some members), this can result in poor dask performance (see: https://github.com/jbusecke/cmip6_preprocessing/issues/58) :param ds_dict: Dictionary of xarray datasets. :type ds_dict: dict :param concat_kwargs: Optional arguments passed to xr.concat. :type concat_kwargs: dict
- Returns
A new dict of xr.Datasets with all datasets from ds_dict, but with concatenated members and adjusted keys.
- Return type
dict
- cmip6_preprocessing.postprocessing.concat_members(ds_dict, concat_kwargs={})[source]¶
Given a dictionary of datasets, this function merges all available ensemble members (given in seperate datasets) into a single dataset for each combination of attributes, like source_id, grid_label, etc. but with concatnated members. CAUTION: If members do not have the same dimensions (e.g. longer run time for some members), this can result in poor dask performance (see: https://github.com/jbusecke/cmip6_preprocessing/issues/58)
- Parameters
ds_dict (dict) – Dictionary of xarray datasets.
concat_kwargs (dict) – Optional arguments passed to xr.concat.
- Returns
A new dict of xr.Datasets with all datasets from ds_dict, but with concatenated members and adjusted keys.
- Return type
dict
- cmip6_preprocessing.postprocessing.interpolate_grid_label(ds_dict, target_grid_label='gn', method='bilinear', xesmf_kwargs={}, merge_kwargs={}, verbose=False)[source]¶
Combines different grid labels via interpolation with xesmf
- Parameters
ds_dict (dict) – dictonary of input datasets
target_grid_label (str, optional) – preferred grid_label value. If at least one dataset has this grid_label, otherse are interpolated to it. Dataset with this grid label are not modified, by default “gn”
method (str, optional) – interpolation method for xesmf, by default “bilinear”
xesmf_kwargs (dict, optional) – optional arguments for building xesmf regridder, by default {}
merge_kwargs (dict, optional) – optional arguments for the merging of interpolated datasets, by default {}
verbose (bool, optional) – print output while creating regridder, by default False
- Returns
dictionary of combined datasets (usually will combine across different variable ids)
- Return type
dict
- cmip6_preprocessing.postprocessing.match_metrics(ds_dict, metric_dict, match_variables, match_attrs=['source_id', 'grid_label'], print_statistics=False, dim_length_conflict='error')[source]¶
Given two dictionaries of datasets, this function matches metrics from metric_dict to every datasets in ds_dict based on a comparison of the datasets attributes.
- Parameters
ds_dict (dict) – Dictionary of xarray datasets, that need matching metrics.
metric_dict (dict) – Dictionary of xarray datasets, which contain metrics as data_variables.
match_variables (list) – Data variables of datasets in metric_dict to parse.
match_attrs (list, optional) – Minimum dataset attributes that need to match, by default [“source_id”, “grid_label”]
print_statistics (bool, optional) – Option to print statistics about matching, by default False
dim_length_conflict (str) – Defines the behavior when parsing metrics with non-exact matches in dimension size. See parse_metric.
- Returns
All datasets from ds_dict, if match was not possible the input dataset is returned unchanged.
- Return type
dict
- cmip6_preprocessing.postprocessing.merge_variables(ds_dict, merge_kwargs={})[source]¶
Given a dictionary of datasets, this function merges all available data variables (given in seperate datasets) into a single dataset. CAUTION: This assumes that all variables are on the same staggered grid position. If you are working with data on the cell edges, this function will disregard that information. Use the grids module instead to get an accurate staggered grid representation.
- Parameters
ds_dict (dict) – Dictionary of xarray datasets, that need matching metrics.
merge_kwargs (dict) – Optional arguments passed to xr.merge.
- Returns
A new dict of xr.Datasets with all datasets from ds_dict, but with merged variables and adjusted keys.
- Return type
dict
grids¶
- cmip6_preprocessing.grids.combine_staggered_grid(ds_base, other_ds=None, recalculate_metrics=False, grid_dict=None, **kwargs)[source]¶
Combine a reference datasets with a list of other datasets to a full xgcm-compatible staggered grid datasets.
- Parameters
ds_base (xr.Dataset) – The reference (‘base’) datasets, assumed to be at the tracer position/cell center
other_ds (list,xr.Dataset, optional) – List of datasets representing different variables. Their grid position will be automatically detected relative to ds_base. Coordinates and attrs of these added datasets will be lost , by default None
recalculate_metrics (bool, optional) –
nables the reconstruction of grid metrics usign simple spherical geometry, by default False
!!! Check your results carefully when using reconstructed values, these might differe substantially if the grid geometry is complicated.
grid_dict (dict, optional) – Dictionary for staggered grid setup. See create_full_grid for detauls If None (default), will load staggered grid info from internal database, by default None
- Returns
Single xgcm-compatible dataset, containing all variables on their respective staggered grid position.
- Return type
xr.Dataset
- cmip6_preprocessing.grids.create_full_grid(base_ds, grid_dict=None)[source]¶
Generate a full xgcm-compatible dataset from a reference datasets base_ds. This dataset should be representing a tracer fields, e.g. the cell center.
- Parameters
base_ds (xr.Dataset) – The reference (‘base’) datasets, assumed to be at the tracer position/cell center
grid_dict (dict, optional) – Dictionary with info about the grid staggering. Must be encoded using the base_ds attrs (e.g. {‘model_name’:{‘axis_shift’:{‘X’:’left’,…}}}). If deactivated (default), will load from the internal database for CMIP6 models, by default None
- Returns
xgcm compatible dataset
- Return type
xr.Dataset
- cmip6_preprocessing.grids.detect_shift(ds_base, ds, axis)[source]¶
Detects the shift of ds relative to ds on logical grid axes, using lon and lat positions.
- Parameters
ds_base (xr.Dataset) – Reference (‘base’) dataset to compare to. Assumed that this is located at the ‘center’ coordinate.
ds (xr.Dataset) – Comparison dataset. The resulting shift will be computed as this dataset relative to ds_base
axis (str) – xgcm logical axis on which to detect the shift
- Returns
Shift string output, in xgcm conventions.
- Return type
str
- cmip6_preprocessing.grids.distance(lon0, lat0, lon1, lat1)[source]¶
Calculate the distance in m between two points on a spherical globe
- Parameters
lon0 (np.array) – Longitude of first point
lat0 (np.array) – Latitude of first point
lon1 (np.array) – Longitude of second point
lat1 (np.array) – Latitude of second point
- cmip6_preprocessing.grids.distance_deg(lon0, lat0, lon1, lat1)[source]¶
Calculate the distance in degress longitude and latitude between two points
- Parameters
lon0 (np.array) – Longitude of first point
lat0 (np.array) – Latitude of first point
lon1 (np.array) – Longitude of second point
lat1 (np.array) – Latitude of second point
- cmip6_preprocessing.grids.recreate_metrics(ds, grid)[source]¶
Recreate a full set of horizontal distance metrics.
Calculates distances between points in lon/lat coordinates
The naming of the metrics is as follows: [metric_axis]_t : metric centered at tracer point [metric_axis]_gx : metric at the cell face on the x-axis.
For instance dx_gx is the x distance centered on the eastern cell face if the shift is right
[metric_axis]_gy : As above but along the y-axis [metric_axis]_gxgy : The metric located at the corner point.
For example dy_dxdy is the y distance on the south-west corner if both axes as shifted left.
- Parameters
ds (xr.Dataset) – Input dataset.
grid (xgcm.Grid) – xgcm Grid object matching ds
- Returns
Dataset with added metrics as coordinates and dictionary that can be passed to xgcm.Grid to recognize new metrics
- Return type
xr.Dataset, dict
regionmask¶
- cmip6_preprocessing.regionmask.merged_mask(basins, ds, lon_name='lon', lat_name='lat', merge_dict=None, verbose=False)[source]¶
Combine geographical basins (from regionmask) to larger ocean basins.
- Parameters
basins (regionmask.core.regions.Regions object) – Loaded basin data from regionmask, e.g. import regionmask;basins = regionmask.defined_regions.natural_earth.ocean_basins_50
ds (xr.Dataset) – Input dataset on which to construct the mask
lon_name (str, optional) – Name of the longitude coordinate in ds, defaults to lon
lat_name (str, optional) – Name of the latitude coordinate in ds, defaults to lat
merge_dict (dict, optional) – dictionary defining new aggregated regions (as keys) and the regions to be merge into that region as as values (list of names). Defaults to large scale ocean basins defined by cmip6_preprocessing.regionmask.default_merge_dict
verbose (bool, optional) – Prints more output, e.g. the regions in basins that were not used in the merging step. Defaults to False.
- Returns
mask – The mask contains ascending numeric value for each key ( merged region) in merge_dict. When the default is used the numeric values correspond to the following regions: * 0: North Atlantic
1: South Atlantic
2: North Pacific
3: South Pacific
4: Maritime Continent
5: Indian Ocean
6: Arctic Ocean
7: Southern Ocean
8: Black Sea
9: Mediterranean Sea
*10: Red Sea
*11: Caspian Sea
- Return type
xr.DataArray
utils¶
- cmip6_preprocessing.utils.cmip6_dataset_id(ds, sep='.', id_attrs=['activity_id', 'institution_id', 'source_id', 'experiment_id', 'variant_label', 'table_id', 'grid_label', 'version', 'variable_id'])[source]¶
Creates a unique string id for e.g. saving files to disk from CMIP6 output
- Parameters
ds (xr.Dataset) – Input dataset
sep (str, optional) – String/Symbol to seperate fields in resulting string, by default “.”
- Returns
Concatenated
- Return type
str
- cmip6_preprocessing.utils.google_cmip_col(catalog='main')[source]¶
A tiny utility function to point to the ‘official’ pangeo cmip6 cloud files.
- cmip6_preprocessing.utils.model_id_match(match_list, id_tuple)[source]¶
Matches id_tuple to the list of tuples exception_list, which can contain wildcards (match any entry) and lists (match any entry that is in the list).
- Parameters
match_list (list) – list of tuples with id strings corresponding to e.g. source_id, grid_label…
id_tuple (tuple) – single tuple with id strings.
