API

preprocessing

xmip.preprocessing.broadcast_lonlat(ds, verbose=True)[source]

Some models (all gr grid_labels) have 1D lon lat arrays This functions broadcasts those so lon/lat are always 2d arrays.

xmip.preprocessing.cmip6_renaming_dict()[source]

a universal renaming dict. Keys correspond to source id (model name) and valuse are a dict of target name (key) and a list of variables that should be renamed into the target.

xmip.preprocessing.combined_preprocessing(ds)[source]
xmip.preprocessing.correct_coordinates(ds, verbose=False)[source]

converts wrongly assigned data_vars to coordinates

xmip.preprocessing.correct_lon(ds)[source]

Wraps negative x and lon values around to have 0-360 lons. longitude names expected to be corrected with rename_cmip6

xmip.preprocessing.correct_units(ds)[source]

Converts coordinates into SI units using pint-xarray

xmip.preprocessing.fix_metadata(ds)[source]

Fix known issues (from errata) with the metadata.

xmip.preprocessing.maybe_convert_bounds_to_vertex(ds)[source]

Converts renamed lon and lat bounds into verticies, by copying the values into the corners. Assumes a rectangular cell.

xmip.preprocessing.maybe_convert_vertex_to_bounds(ds)[source]

Converts lon and lat verticies to bounds by averaging corner points on the appropriate cell face center.

xmip.preprocessing.parse_lon_lat_bounds(ds)[source]

both regular 2d bounds and vertex bounds are parsed as *_bounds. This function renames them to *_verticies if the vertex dimension is found. Also removes time dimension from static bounds as found in e.g. SAM0-UNICON model.

xmip.preprocessing.promote_empty_dims(ds)[source]

Convert empty dimensions to actual coordinates

xmip.preprocessing.rename_cmip6(ds, rename_dict=None)[source]

Homogenizes cmip6 dataasets to common naming

xmip.preprocessing.replace_x_y_nominal_lat_lon(ds)[source]

Approximate the dimensional values of x and y with mean lat and lon at the equator

xmip.preprocessing.sort_vertex_order(ds)[source]

sorts the vertex dimension in a coherent order: 0: lower left 1: upper left 2: upper right 3: lower right

postprocessing

xmip.postprocessing.combine_datasets(ds_dict, combine_func, combine_func_args=(), combine_func_kwargs={}, match_attrs=['source_id', 'grid_label', 'experiment_id', 'table_id', 'variant_label', 'variable_id'])[source]

General combination function to combine datasets within a dictionary according to their matching attributes. This function provides maximum flexibility, but can be somewhat complex to set up. The postprocessing module provided several convenience wrappers like merge_variables, concat_members, etc.

Parameters
  • ds_dict ([type]) – [description]

  • combine_func ([type]) – [description]

  • combine_func_args (tuple, optional) – [description], by default ()

  • combine_func_kwargs (dict, optional) – [description], by default {}

  • match_attrs ([type], optional) – [description], by default exact_attrs

Returns

[description]

Return type

[type]

xmip.postprocessing.concat_experiments(ds_dict, concat_kwargs={})[source]

Given a dictionary of datasets, this function merges all available ensemble members (given in seperate datasets) into a single dataset for each combination of attributes, like source_id, grid_label, etc. but with concatnated members. CAUTION: If members do not have the same dimensions (e.g. longer run time for some members), this can result in poor dask performance (see: https://github.com/jbusecke/xmip/issues/58) :param ds_dict: Dictionary of xarray datasets. :type ds_dict: dict :param concat_kwargs: Optional arguments passed to xr.concat. :type concat_kwargs: dict

Returns

A new dict of xr.Datasets with all datasets from ds_dict, but with concatenated members and adjusted keys.

Return type

dict

xmip.postprocessing.concat_members(ds_dict, concat_kwargs={})[source]

Given a dictionary of datasets, this function merges all available ensemble members (given in seperate datasets) into a single dataset for each combination of attributes, like source_id, grid_label, etc. but with concatnated members. CAUTION: If members do not have the same dimensions (e.g. longer run time for some members), this can result in poor dask performance (see: https://github.com/jbusecke/xmip/issues/58)

Parameters
  • ds_dict (dict) – Dictionary of xarray datasets.

  • concat_kwargs (dict) – Optional arguments passed to xr.concat.

Returns

A new dict of xr.Datasets with all datasets from ds_dict, but with concatenated members and adjusted keys.

Return type

dict

xmip.postprocessing.interpolate_grid_label(ds_dict, target_grid_label='gn', method='bilinear', xesmf_kwargs={}, merge_kwargs={}, verbose=False)[source]

Combines different grid labels via interpolation with xesmf

Parameters
  • ds_dict (dict) – dictonary of input datasets

  • target_grid_label (str, optional) – preferred grid_label value. If at least one dataset has this grid_label, otherse are interpolated to it. Dataset with this grid label are not modified, by default “gn”

  • method (str, optional) – interpolation method for xesmf, by default “bilinear”

  • xesmf_kwargs (dict, optional) – optional arguments for building xesmf regridder, by default {}

  • merge_kwargs (dict, optional) – optional arguments for the merging of interpolated datasets, by default {}

  • verbose (bool, optional) – print output while creating regridder, by default False

Returns

dictionary of combined datasets (usually will combine across different variable ids)

Return type

dict

xmip.postprocessing.match_metrics(ds_dict, metric_dict, match_variables, match_attrs=['source_id', 'grid_label'], print_statistics=False, dim_length_conflict='error')[source]

Given two dictionaries of datasets, this function matches metrics from metric_dict to every datasets in ds_dict based on a comparison of the datasets attributes.

Parameters
  • ds_dict (dict) – Dictionary of xarray datasets, that need matching metrics.

  • metric_dict (dict) – Dictionary of xarray datasets, which contain metrics as data_variables.

  • match_variables (list) – Data variables of datasets in metric_dict to parse.

  • match_attrs (list, optional) – Minimum dataset attributes that need to match, by default [“source_id”, “grid_label”]. Pass “exact” to only allow exact matches using all required attributes.

  • print_statistics (bool, optional) – Option to print statistics about matching, by default False

  • dim_length_conflict (str) – Defines the behavior when parsing metrics with non-exact matches in dimension size. See parse_metric.

Returns

All datasets from ds_dict, if match was not possible the input dataset is returned unchanged.

Return type

dict

xmip.postprocessing.merge_variables(ds_dict, merge_kwargs={})[source]

Given a dictionary of datasets, this function merges all available data variables (given in seperate datasets) into a single dataset. CAUTION: This assumes that all variables are on the same staggered grid position. If you are working with data on the cell edges, this function will disregard that information. Use the grids module instead to get an accurate staggered grid representation.

Parameters
  • ds_dict (dict) – Dictionary of xarray datasets, that need matching metrics.

  • merge_kwargs (dict) – Optional arguments passed to xr.merge.

Returns

A new dict of xr.Datasets with all datasets from ds_dict, but with merged variables and adjusted keys.

Return type

dict

xmip.postprocessing.pick_first_member(ddict)[source]
xmip.postprocessing.requires_xesmf(func)[source]

grids

xmip.grids.combine_staggered_grid(ds_base, other_ds=None, recalculate_metrics=False, grid_dict=None, **kwargs)[source]

Combine a reference datasets with a list of other datasets to a full xgcm-compatible staggered grid datasets.

Parameters
  • ds_base (xr.Dataset) – The reference (‘base’) datasets, assumed to be at the tracer position/cell center

  • other_ds (list,xr.Dataset, optional) – List of datasets representing different variables. Their grid position will be automatically detected relative to ds_base. Coordinates and attrs of these added datasets will be lost , by default None

  • recalculate_metrics (bool, optional) –

    nables the reconstruction of grid metrics usign simple spherical geometry, by default False

    !!! Check your results carefully when using reconstructed values, these might differe substantially if the grid geometry is complicated.

  • grid_dict (dict, optional) – Dictionary for staggered grid setup. See create_full_grid for detauls If None (default), will load staggered grid info from internal database, by default None

Returns

Single xgcm-compatible dataset, containing all variables on their respective staggered grid position.

Return type

xr.Dataset

xmip.grids.create_full_grid(base_ds, grid_dict=None)[source]

Generate a full xgcm-compatible dataset from a reference datasets base_ds. This dataset should be representing a tracer fields, e.g. the cell center.

Parameters
  • base_ds (xr.Dataset) – The reference (‘base’) datasets, assumed to be at the tracer position/cell center

  • grid_dict (dict, optional) – Dictionary with info about the grid staggering. Must be encoded using the base_ds attrs (e.g. {‘model_name’:{‘axis_shift’:{‘X’:’left’,…}}}). If deactivated (default), will load from the internal database for CMIP6 models, by default None

Returns

xgcm compatible dataset

Return type

xr.Dataset

xmip.grids.detect_shift(ds_base, ds, axis)[source]

Detects the shift of ds relative to ds on logical grid axes, using lon and lat positions.

Parameters
  • ds_base (xr.Dataset) – Reference (‘base’) dataset to compare to. Assumed that this is located at the ‘center’ coordinate.

  • ds (xr.Dataset) – Comparison dataset. The resulting shift will be computed as this dataset relative to ds_base

  • axis (str) – xgcm logical axis on which to detect the shift

Returns

Shift string output, in xgcm conventions.

Return type

str

xmip.grids.distance(lon0, lat0, lon1, lat1)[source]

Calculate the distance in m between two points on a spherical globe

Parameters
  • lon0 (np.array) – Longitude of first point

  • lat0 (np.array) – Latitude of first point

  • lon1 (np.array) – Longitude of second point

  • lat1 (np.array) – Latitude of second point

xmip.grids.distance_deg(lon0, lat0, lon1, lat1)[source]

Calculate the distance in degress longitude and latitude between two points

Parameters
  • lon0 (np.array) – Longitude of first point

  • lat0 (np.array) – Latitude of first point

  • lon1 (np.array) – Longitude of second point

  • lat1 (np.array) – Latitude of second point

xmip.grids.recreate_metrics(ds, grid)[source]

Recreate a full set of horizontal distance metrics.

Calculates distances between points in lon/lat coordinates

The naming of the metrics is as follows: [metric_axis]_t : metric centered at tracer point [metric_axis]_gx : metric at the cell face on the x-axis.

For instance dx_gx is the x distance centered on the eastern cell face if the shift is right

[metric_axis]_gy : As above but along the y-axis [metric_axis]_gxgy : The metric located at the corner point.

For example dy_dxdy is the y distance on the south-west corner if both axes as shifted left.

Parameters
  • ds (xr.Dataset) – Input dataset.

  • grid (xgcm.Grid) – xgcm Grid object matching ds

Returns

Dataset with added metrics as coordinates and dictionary that can be passed to xgcm.Grid to recognize new metrics

Return type

xr.Dataset, dict

regionmask

xmip.regionmask.merged_mask(basins, ds, lon_name='lon', lat_name='lat', merge_dict=None, verbose=False)[source]

Combine geographical basins (from regionmask) to larger ocean basins.

Parameters
  • basins (regionmask.core.regions.Regions object) – Loaded basin data from regionmask, e.g. import regionmask;basins = regionmask.defined_regions.natural_earth.ocean_basins_50

  • ds (xr.Dataset) – Input dataset on which to construct the mask

  • lon_name (str, optional) – Name of the longitude coordinate in ds, defaults to lon

  • lat_name (str, optional) – Name of the latitude coordinate in ds, defaults to lat

  • merge_dict (dict, optional) – dictionary defining new aggregated regions (as keys) and the regions to be merge into that region as as values (list of names). Defaults to large scale ocean basins defined by xmip.regionmask.default_merge_dict

  • verbose (bool, optional) – Prints more output, e.g. the regions in basins that were not used in the merging step. Defaults to False.

Returns

mask – The mask contains ascending numeric value for each key ( merged region) in merge_dict. When the default is used the numeric values correspond to the following regions: * 0: North Atlantic

  • 1: South Atlantic

  • 2: North Pacific

  • 3: South Pacific

  • 4: Maritime Continent

  • 5: Indian Ocean

  • 6: Arctic Ocean

  • 7: Southern Ocean

  • 8: Black Sea

  • 9: Mediterranean Sea

*10: Red Sea

*11: Caspian Sea

Return type

xr.DataArray

utils

xmip.utils.cmip6_dataset_id(ds, sep='.', id_attrs=['activity_id', 'institution_id', 'source_id', 'experiment_id', 'variant_label', 'table_id', 'grid_label', 'version', 'variable_id'])[source]

Creates a unique string id for e.g. saving files to disk from CMIP6 output

Parameters
  • ds (xr.Dataset) – Input dataset

  • sep (str, optional) – String/Symbol to seperate fields in resulting string, by default “.”

Returns

Concatenated

Return type

str

xmip.utils.google_cmip_col(catalog='main')[source]

A tiny utility function to point to the ‘official’ pangeo cmip6 cloud files.

xmip.utils.model_id_match(match_list, id_tuple)[source]

Matches id_tuple to the list of tuples exception_list, which can contain wildcards (match any entry) and lists (match any entry that is in the list).

Parameters
  • match_list (list) – list of tuples with id strings corresponding to e.g. source_id, grid_label

  • id_tuple (tuple) – single tuple with id strings.