Applying Gaussian Pigment (GPig) Phytoplankton Community Composition Algorithm to OCI data

Applying Gaussian Pigment (GPig) Phytoplankton Community Composition Algorithm to OCI data#

Authors: Anna Windle (NASA, SSAI), Max Danenhower (Bowdoin College), Ali Chase (University of Washington)

Last updated: August 20, 2025

Summary#

This notebook applies the inversion algorithm described in Chase et al., 2017 to estimate phytoplankton pigment concentrations from PACE OCI Rrs data. This algorithm, called Gaussian Pigment (GPig), is currently being implemented in OBPG’s OCSSW software. This work was originally developed in MatLab by Ali Chase and subsequently translated to Python by Max Danenhower, Charles Stern, and Ali Chase. This tutorial demonstrates how to apply the Python GPig algorithm to Level-2 (L2) and Level-3 Mapped (L3M) PACE OCI data.

Learning Objectives#

At the end of this notebook you will know:

  • How to use a packaged Python project

  • How to run the GPig Algorithm on PACE OCI L2 and L3 data

Contents#

  1. Setup

  2. Run GPig on L2 Data

  3. Run GPig on L3M Data

1. Setup#

The GPig Python code has been packaged to allow it to be easily installed, imported, and reused. We will use pip install to get the current packaged code from Github. You may need to restart the kernel after the installation before continuing.

%pip install git+https://github.com/max-danenhower/pace-rrs-inversions-pigments
/tmp/uv/venv/bin/python: No module named pip
Note: you may need to restart the kernel to use updated packages.

Next, we’ll import all of the packages used in this notebook.

import cartopy.crs as ccrs
import cartopy.feature as cfeature
import earthaccess
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
from gpig import L2_utils, L3_utils

crs = ccrs.PlateCarree()

Set (and persist to your home directory on the host, if needed) your Earthdata Login credentials.

auth = earthaccess.login(persist=True)

back to top

2. Run GPig on L2 Data#

Let’s first download the data using the L2_utils.load() function. Let’s take a look at the docstring:

?L2_utils.load_data

Running L2_utils.load_data() downloads several data files from NASA EarthData:

  • Rrs: PACE OCI Level-2 AOP data products

  • Sea surface salinity: JPL SMAP-SSS V5.0 CAP, 8-day running mean, level 3 mapped, sea surface salinity (SSS) product from the NASA Soil Moisture Active Passive (SMAP) observatory

  • Sea surface temperature: Group for High Resolution Sea Surface Temperature (GHRSST) Level 4 sea surface temperature

Let’s run L2_utils.load_data() to download data collected on May 5, 2025 corresponding to a bounding box off the coast of Washington, U.S.

rrs, sss, sst = L2_utils.load_data(("2025-05-01", "2025-05-01"), (-127, 40, -126, 41))
datatree = xr.open_datatree(rrs[0])
dataset = xr.merge(datatree.to_dict().values())
l2_dataset = dataset.set_coords(("longitude", "latitude"))

You should see 3 new folders in your working directory called L2_rrs_data, sal_data, and temp_data. Let’s take a quick look at Rrs at 500 nm:

fig, ax = plt.subplots(figsize=(8, 6), subplot_kw={"projection": crs})

data = l2_dataset["Rrs"].sel({"wavelength_3d": 500})
data.plot(
    x="longitude",
    y="latitude",
    vmin=0,
    vmax=0.008,
    ax=ax,
    cbar_kwargs={"label": "Rrs ($sr^{-1}$)"},
)

ax.set_extent([-135, -115, 35, 55], crs=crs)
ax.coastlines(resolution="10m")
ax.add_feature(cfeature.BORDERS, linestyle=":")
ax.gridlines(
    draw_labels=["left", "bottom"],
    linewidth=0.5,
    color="gray",
    alpha=0.5,
    linestyle="--",
)
plt.show()
../../_images/8a34175640141f084b1a1f0da8a6bac9c594d0a62878393f31ef246f9df4a1aa.png

Now, we can use L2_utils.estimate_inv_pigments to calculate phytoplankton pigment concentrations for this data. Let’s take a look at the docstring for this function:

?L2_utils.estimate_inv_pigments

You can see that this function accepts a bounding box (bbox) as a parameter. The default is None which means it will run the algorithm on every single pixel in the L2 file, which can take a long time. We will supply the bbox parameter with the following coordinates: 45 N, 44 S, -125 E, -126 W.

Let’s first see what this bounding box covers:

bbox = (-126, 47, -125, 48)

lon_min, lat_min, lon_max, lat_max = bbox
rect_lon = [lon_min, lon_max, lon_max, lon_min, lon_min]
rect_lat = [lat_min, lat_min, lat_max, lat_max, lat_min]
ax.plot(rect_lon, rect_lat, color="red", linewidth=2)

fig
../../_images/0b6c1dc287ed1f6545242adcfe3628bd216ddac23b289657efcadc49df167688.png

Let’s run it. This can take some time depending on bounding box size.

l2_pigments = L2_utils.estimate_inv_pigments(rrs[0], sss, sst, bbox)
L2_rrs_data/PACE_OCI.20250501T201332.L2.OC_AOP.V3_0.nc
-126 47 -125 48
Progress: 6042/6042

The inversion provides four pigment variables.

l2_pigments
<xarray.Dataset> Size: 242kB
Dimensions:    (number_of_lines: 106, pixels_per_line: 57)
Coordinates:
    longitude  (number_of_lines, pixels_per_line) float32 24kB -125.9 ... -125.1
    latitude   (number_of_lines, pixels_per_line) float32 24kB 46.81 ... 48.19
Dimensions without coordinates: number_of_lines, pixels_per_line
Data variables:
    chla       (number_of_lines, pixels_per_line) float64 48kB nan nan ... nan
    chlb       (number_of_lines, pixels_per_line) float64 48kB nan nan ... nan
    chlc       (number_of_lines, pixels_per_line) float64 48kB nan nan ... nan
    ppc        (number_of_lines, pixels_per_line) float64 48kB nan nan ... nan

Let’s plot the phytoplankton pigment concentrations:

fig, axs = plt.subplots(2, 2, figsize=(10, 8), subplot_kw={"projection": crs})

variables = ["chla", "chlb", "chlc", "ppc"]
titles = ["Chlorophyll-a", "Chlorophyll-b", "Chlorophyll-c", "PPC"]
for ax, var, title in zip(axs.flat, variables, titles):
    data = l2_pigments[var]
    lon = l2_pigments["longitude"]
    lat = l2_pigments["latitude"]

    data_log = np.log10(data.where(data > 0))

    im = ax.pcolormesh(lon, lat, data_log, cmap="viridis", shading="auto")

    ax.gridlines(
        draw_labels=["left", "bottom"],
        linewidth=0.5,
        color="gray",
        alpha=0.5,
        linestyle="--",
    )
    ax.set_title(title)
    ax.set_xlim(bbox[0], bbox[2])
    ax.set_ylim(bbox[1], bbox[3])

    cbar = fig.colorbar(im, ax=ax, orientation="vertical")
    cbar.set_label(f"$log_{{10}}({var})$")

plt.tight_layout()
plt.show()
../../_images/0ea9e74b5166a6affc298a3a443565791d22b62ca5660a6bfed465ef6aaab98e.png

3. Run GPig on L3M Data#

We can also run GPig on L3M data. Let’s look at the L3_utils.load_data() docstring:

?L3_utils.load_data

We’ll use this to download 4km L3M Rrs data, global SSS, and global SST data for June 12, 2024:

rrs, sss, sst = L3_utils.load_data(("2024-06-12", "2024-06-12"), "4km")
l3_dataset = xr.open_dataset(rrs[0])

Let’s quickly look at L3M Rrs at 500 nm:

fig, ax = plt.subplots(figsize=(10, 6), subplot_kw={"projection": crs})

data = l3_dataset["Rrs"].sel({"wavelength": 500})
data.plot.imshow(
    x="lon",
    y="lat",
    vmin=0,
    vmax=0.008,
    ax=ax,
    cbar_kwargs={"label": r"Rrs ($sr^{-1}$)", "fraction": 0.046},
)

ax.coastlines(resolution="10m")
ax.gridlines(
    draw_labels=["left", "bottom"],
    linewidth=0.5,
    color="gray",
    alpha=0.5,
    linestyle="--",
)

plt.show()
../../_images/8b5a4d08b3ced768b9e9516277d56fda25ca7c27d1a0b0a7ec2ad48db9e9e02f.png

Let’s look at what L3_utils.estimate_inv_pigments requires as input:

?L3_utils.estimate_inv_pigments

We’ll provide the Rrs, SSS, and SST file and input a bounding box corresponding to the coast of Washington, U.S.

Let’s take a look at what data this bounding box covers:

bbox = (-127, 40, -126, 41)

lon_min, lat_min, lon_max, lat_max = bbox
rect_lon = [lon_min, lon_max, lon_max, lon_min, lon_min]
rect_lat = [lat_min, lat_min, lat_max, lat_max, lat_min]
ax.plot(rect_lon, rect_lat, color="red", linewidth=2)

ax.set_extent([-130, -115, 32, 50], crs=crs)

fig
../../_images/1e81fb207c79a26103423d2866cd6bfcf61ba772775b2447ecf10def72f86ffb.png
l3_pigments = L3_utils.estimate_inv_pigments(rrs, sss, sst, bbox)
num pixels:  576
Progress: 576/576

The inversion provides four pigment variables.

l3_pigments
<xarray.Dataset> Size: 19kB
Dimensions:  (lat: 24, lon: 24)
Coordinates:
  * lat      (lat) float32 96B 40.98 40.94 40.9 40.85 ... 40.15 40.1 40.06 40.02
  * lon      (lon) float32 96B -127.0 -126.9 -126.9 ... -126.1 -126.1 -126.0
Data variables:
    chla     (lat, lon) float64 5kB nan nan nan nan nan ... nan nan nan nan nan
    chlb     (lat, lon) float64 5kB nan nan nan nan nan ... nan nan nan nan nan
    chlc     (lat, lon) float64 5kB nan nan nan nan nan ... nan nan nan nan nan
    ppc      (lat, lon) float64 5kB nan nan nan nan nan ... nan nan nan nan nan

Let’s plot all phytoplankton pigment concentrations:

fig, axs = plt.subplots(2, 2, figsize=(10, 8), subplot_kw={"projection": crs})

variables = ["chla", "chlb", "chlc", "ppc"]
titles = ["Chlorophyll-a", "Chlorophyll-b", "Chlorophyll-c", "PPC"]
for ax, var, title in zip(axs.flat, variables, titles):

    data = l3_pigments[var]
    data_log = np.log10(data.where(data > 0))

    im = data_log.plot.imshow(
        ax=ax,
        cmap="viridis",
        add_colorbar=False,
    )
    
    ax.coastlines(resolution="10m")
    ax.set_title(title)
    ax.gridlines(
        draw_labels=["left", "bottom"],
        linewidth=0.5,
        color="gray",
        alpha=0.5,
        linestyle="--",
    )

    fig.colorbar(im, ax=ax, orientation="vertical", label=f"$log_{{10}}({var})$")

plt.tight_layout()
plt.show()
../../_images/46f3f51988f20627ccc3cb2b09708b51fbc0d5444791cb6304e8be7ebe358a96.png

back to top