Re-gridding PACE data

Re-gridding PACE data#

Author: Thiago Nobrega (University of Sao Paulo)

This tutorial will show you how to regrid Level-3 data from different satellites images to match the grid from PACE OCI data. I show you to visualize the regridded data from the instruments ABI-GOES, VIIRS, MODIS and OCLI.

Setup and access the data:#

We begin by importing the packages used in this notebook.

import earthaccess
import xarray as xr
from xarray.backends.api import open_datatree
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import numpy as np
from scipy.interpolate import griddata
import pandas as pd  # Import pandas

Then define the timespan and access some example datas that are available on the Earthaccess

tspan = ("2024-07-01", "2024-07-03")

results_PACE = earthaccess.search_data(
    short_name="PACE_OCI_L3M_CHL_NRT",
    temporal=tspan,
)

results_GOES = earthaccess.search_data(
    short_name="ABI_G16-STAR-L3C-v2.70",
    temporal=tspan,
)

results_VIIRS = earthaccess.search_data(
    short_name="VIIRSJ1_L3m_CHL_NRT",
    temporal=tspan,
)

results_MODIS = earthaccess.search_data(
    short_name="MODISA_L3m_CHL_NRT",
    temporal=tspan,
)

results_OLCI = earthaccess.search_data(
    short_name="OLCIS3B_L3m_ERR_CHL_NRT",
    temporal=tspan,
)

paths_PACE = earthaccess.open(results_PACE)
paths_GOES = earthaccess.open(results_GOES)

The MODIS, VIIRS and OLCI data it is not avaliable in the cloud so we can use the following codes to access example datas for the purpose of the notebook

dataset_MODIS = xr.open_dataset(results_MODIS[0]['umm']['RelatedUrls'][1]['URL'])
dataset_VIIRS = xr.open_dataset(results_VIIRS[0]['umm']['RelatedUrls'][1]['URL'])
dataset_OLCI = xr.open_dataset(results_OLCI[0]['umm']['RelatedUrls'][1]['URL'])

dataset_PACE = xr.open_dataset(paths_PACE[0])
dataset_GOES = xr.open_dataset(paths_GOES[0])

Functions: Filter and regrid#

The ABI-GOES data and many others are configured with three coordinates, time, lat and lon, to be able to regrid the whole dataset it’s necessary to filter all the data that has datetime as type and the data that has no coordinates too, the following function filter the dataset for the regrid

def filter_dataset_v3(ds):

    #Find the coordinates that are datetime type
    datetime_coords = [coord for coord in ds.coords if xr.core.common.is_np_datetime_like(ds[coord].dtype)]
    # Filter out coordinates that are of datetime type
    filtered_coords = ds.isel(time=0).drop_vars(str(datetime_coords[0]))
    # Filter out variables that are empty or of datetime type or has no coordinates associated with it
    filtered_vars = {
        var: ds[var]
        for var in ds.data_vars
        if ds[var].size > 0 
        and not xr.core.common.is_np_datetime_like(ds[var].dtype)
        and set(ds[var].dims).intersection(ds.coords)
    }
    
    # Create a new dataset with the filtered variables
    filtered_ds = xr.Dataset(filtered_vars)
    return filtered_ds

The function bellow regrid a dataset based on the grid of a reference dataset, there is a flag if the target dataset is already finer than the reference dataset. The regrid is made coarsening the dataset with finer resolution to match the resolution of the target dataset, this example is set with averaging the pixels within the block, but is possible to do the same by taking the max value, the minimum, the sum, etc. The example uses the trim method over the boundaries, so it discards the extra data that can’t form a full block, but it is possible to uses other methods like pad(pads the data to complete blocks), exact(ensures all blocks are complete, raises an error if not) and expand(expands the last block with NaNs to include all data).

def regrid_to_match_v8(reference_ds, target_ds):
    """
    Regrid the target xarray dataset to match the dimensions of the reference xarray dataset using coarsening.

    Parameters:
    reference_ds (xarray.Dataset): The reference dataset with the desired dimensions.
    target_ds (xarray.Dataset): The dataset to be regridded.

    Returns:
    xarray.Dataset: The regridded dataset.
    """
    
    # Identify the dimension names for lat and lon in both datasets
    def get_lat_lon_names(ds):
        for dim in ds.dims:
            if 'lat' in dim.lower():
                lat_dim = dim
            elif 'lon' in dim.lower():
                lon_dim = dim
        return lat_dim, lon_dim
    
    ref_lat_dim, ref_lon_dim = get_lat_lon_names(reference_ds)
    tgt_lat_dim, tgt_lon_dim = get_lat_lon_names(target_ds)
    
    # Get the shapes of the reference and target datasets
    ref_lat_len = reference_ds.sizes[ref_lat_dim]
    ref_lon_len = reference_ds.sizes[ref_lon_dim]
    tgt_lat_len = target_ds.sizes[tgt_lat_dim]
    tgt_lon_len = target_ds.sizes[tgt_lon_dim]
    
    # Calculate coarsening factors
    factor_lat = tgt_lat_len // ref_lat_len
    factor_lon = tgt_lon_len // ref_lon_len

    # Ensure the factors are greater than 1
    if factor_lat < 1 or factor_lon < 1:
        raise ValueError("The target dataset is already finer than the reference dataset.")

    # Coarsen the target dataset
    coarsened_ds = target_ds.coarsen({tgt_lat_dim: factor_lat, tgt_lon_dim: factor_lon}, boundary='trim').mean()

    # Interpolate to match exactly the reference dataset dimensions if necessary
    if coarsened_ds.sizes[tgt_lat_dim] != ref_lat_len or coarsened_ds.sizes[tgt_lon_dim] != ref_lon_len:
        coarsened_ds = coarsened_ds.interp(**{ref_lat_dim: reference_ds[ref_lat_dim], ref_lon_dim: reference_ds[ref_lon_dim]})

    return coarsened_ds

Here are some examples of how to use the filter and the regrid functions

dataset_goes_filtered = filter_dataset_v3(dataset_GOES)
new_goes_grid = regrid_to_match_v8(dataset_PACE,dataset_goes_filtered)

new_viirs_grid = regrid_to_match_v8(dataset_PACE,dataset_VIIRS)
new_modis_grid = regrid_to_match_v8(dataset_PACE,dataset_MODIS)
new_olci_grid = regrid_to_match_v8(dataset_PACE,dataset_OLCI)

Example regrids and plots#

Now we see some examples, first the plots of chlorophyll concentrations and sea surface temperature before the regrid for all the examples datas

datasets = {
    "MODIS": dataset_MODIS,
    "PACE": dataset_PACE,
    "GOES": dataset_GOES,
    "VIIRS": dataset_VIIRS,
    "OLCI": dataset_OLCI
}

# Define the region for slicing
lat_slice = slice(44, 20)
lon_slice = slice(-81, -50)

# Create a subplot scheme
fig, axes = plt.subplots(nrows=3, ncols=2, figsize=(15, 20), constrained_layout=True)
axes = axes.flatten()

# Iterate over datasets and plot
for i, (name, dataset) in enumerate(datasets.items()):
    if i >= len(axes):  # If there are more datasets than subplot cells, break
        break

    if name == "GOES":
        # Select the sea_surface_temperature variable and the defined region
        sst_box = dataset["sea_surface_temperature"].sel(lat=lat_slice, lon=lon_slice)

        # Get the 'data_created' attribute from the dataset
        data_created = dataset.attrs.get("date_created", "Unknown date")
        
        # Plot on the corresponding subplot axis with a different colormap
        sst_box.plot(ax=axes[i], cmap="coolwarm")
        axes[i].set_title(f"{name} sea_surface_temperature\nData Created: {data_created}")
    else:
        # Select the chlor_a variable and the defined region
        chlor_a_box = dataset["chlor_a"].sel(lat=lat_slice, lon=lon_slice)
        
        # Apply log10 transformation
        chla = np.log10(chlor_a_box)
        
        # Update the attributes
        units = dataset["chlor_a"].attrs.get("units", "unknown")
        chla.attrs.update({"units": f'lg({units})'})

        # Get the 'data_created' attribute from the dataset
        data_created = dataset.attrs.get("date_created", "Unknown date")

        # Plot on the corresponding subplot axis
        chla.plot(ax=axes[i], cmap="GnBu_r")
        #axes[i].set_title(f"{name} chlor_a")
        # Set the title to include the 'data_created' attribute
        axes[i].set_title(f"{name} chlor_a\nData Created: {data_created}")
# Remove any unused subplot axes
for j in range(i + 1, len(axes)):
    fig.delaxes(axes[j])

plt.show()

../_images/03b8f625814eae0a8790cdd61b4b2c2fafe4f64854a86a011ed485994aad7f2d.png

And then the same images after the regrid

datasets_2 = {
    "MODIS": new_modis_grid,
    "PACE": dataset_PACE,
    "GOES": new_goes_grid,
    "VIIRS": new_viirs_grid,
    "OLCI": new_olci_grid
}

# Define the region for slicing
lat_slice = slice(44, 20)
lon_slice = slice(-81, -50)

# Create a subplot scheme
fig, axes = plt.subplots(nrows=3, ncols=2, figsize=(15, 20), constrained_layout=True)
axes = axes.flatten()

# Iterate over datasets and plot
for i, (name, datasets_2) in enumerate(datasets_2.items()):
    if i >= len(axes):  # If there are more datasets than subplot cells, break
        break

    if name == "GOES":
        # Select the sea_surface_temperature variable and the defined region
        sst_box = datasets_2["sea_surface_temperature"].sel(lat=lat_slice, lon=lon_slice)
        
        # Plot on the corresponding subplot axis with a different colormap
        sst_box.plot(ax=axes[i], cmap="coolwarm")
        axes[i].set_title(f"{name} sea_surface_temperature")
    else:
        # Select the chlor_a variable and the defined region
        chlor_a_box = datasets_2["chlor_a"].sel(lat=lat_slice, lon=lon_slice)
        
        # Apply log10 transformation
        chla = np.log10(chlor_a_box)
        
        # Update the attributes
        units = datasets_2["chlor_a"].attrs.get("units", "unknown")
        chla.attrs.update({"units": f'lg({units})'})

        # Plot on the corresponding subplot axis
        chla.plot(ax=axes[i], cmap="GnBu_r")
        axes[i].set_title(f"{name} chlor_a")

# Remove any unused subplot axes
for j in range(i + 1, len(axes)):
    fig.delaxes(axes[j])

plt.show()

../_images/c44f0fca6537713ff107143f7bfb78c1d9f40354a5fdc20c665c026b6b25a4be.png

Now we can compare the measurements of chlorophyll concentrations of VIIRS, MODIS and OCLI with the measurement made by the PACE OCI instrument

# Define the region for slicing
lat_slice = slice(44, 20)
lon_slice = slice(-81, -50)
# Select the chlor_a variable and the defined region for all datasets
datasets = {
    "MODIS": new_modis_grid["chlor_a"].sel(lat=lat_slice, lon=lon_slice),
    "PACE": dataset_PACE["chlor_a"].sel(lat=lat_slice, lon=lon_slice),
    "VIIRS": new_viirs_grid["chlor_a"].sel(lat=lat_slice, lon=lon_slice),
    "OLCI": new_olci_grid["chlor_a"].sel(lat=lat_slice, lon=lon_slice)
}

# Ensure all datasets are on the same grid (regrid if necessary)
# For simplicity, assuming they are already on the same grid

# Compute the difference with respect to PACE
differences = {name: datasets[name] - datasets["PACE"] for name in datasets if name != "PACE"}

# Create a subplot scheme
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(20, 15), constrained_layout=True)
axes = axes.flatten()

# Plot the differences
for i, (name, diff) in enumerate(differences.items()):
    diff.plot(ax=axes[i], cmap="RdYlBu", center=0)
    axes[i].set_title(f"{name} - PACE chlor_a difference")

# Remove any unused subplot axes
for j in range(i + 1, len(axes)):
    fig.delaxes(axes[j])

plt.show()

../_images/50ddbc65ed43772714b3208b74b303d84d347a1ab2ae6651217f270211ecc0b0.png

Re-gridding PACE data

Contents

Re-gridding PACE data#

Setup and access the data:#

Functions: Filter and regrid#

Example regrids and plots#