Reprojecting and Formatting PACE OCI Data

Reprojecting and Formatting PACE OCI Data#

Authors: Skye Caplan (NASA, SSAI)
Last updated: July 28, 2025

An Earthdata Login account is required to access data from the NASA Earthdata system, including NASA PACE data.

Executing this notebook requires an instance with up to 8GB of memory.

Summary#

This notebook will use rioxarray to reproject PACE OCI data from the instrument swath into a projected coordinate system and save the file as a GeoTIFF (.tif), a common data format used in GIS applications.

Learning Objectives#

At the end of this notebook you will know how to:

Open PACE OCI surface reflectance and vegetation index products
Reproject those data into defined coordinate reference systems
Export those reprojected data as GeoTIFFs

1. Setup#

Begin by importing all of the packages used in this notebook. Please ensure your environment has the most recent versions of rioxarray (>=0.19.0) and rasterio (>=1.4.3), as the functionality allowing us to correctly convert PACE Level-2 (L2) files to GeoTIFF is relatively new.

from pathlib import Path

import cartopy
import cartopy.crs as ccrs
import cf_xarray  # noqa: F401
import earthaccess
import matplotlib.pyplot as plt
import numpy as np
import rasterio
import rioxarray as rio
import xarray as xr
from rasterio.enums import Resampling

The goal of this tutorial is to reproject and convert Level-2 (L2) PACE OCI data between formats, but L2 PACE OCI data comes in many forms. We’ll cover two examples here - one with 3-dimensional surface reflectance (SFREFL) data, and one with 2-dimensional vegetation index (VI) data - to illustrate how these datasets need to be handled.

The following cells use earthaccess to set and persist your Earthdata login credentials, then search for and download the relevant datasets for a scene covering eastern North America.

auth = earthaccess.login(persist=True)

results = earthaccess.search_data(
    short_name=["PACE_OCI_L2_SFREFL", "PACE_OCI_L2_LANDVI"],
    granule_name="*20240701T175112*",
)
for item in results:
    display(item)

Data: PACE_OCI.20240701T175112.L2.LANDVI.V3_0.nc

Size: 51.86 MB

Cloud Hosted: True

Data: PACE_OCI.20240701T175112.L2.SFREFL.V3_0.nc

Size: 750.27 MB

Cloud Hosted: True

paths = earthaccess.download(results, local_path="data")

back to top

3. Exporting to GeoTIFF#

Now that we have two georeferenced, projected datasets of surface reflectances and VIs, you can export them to your preferred format. Here, we export to a Cloud Optimized Geotiff, or COG, using the “COG” driver with applicable profile options.

COGs are a subset of Geotiffs which have been optimized to work in cloud environments. If you are not working in the cloud, you don’t have to worry, as COGs are backwards compatible with Geotiffs. That is, any software that can be used to analyze a Geotiff can also be used with COGs. For more information on the COG format, please see the cogeo website. There is also a very useful plug in for rasterio called rio-cogeo for creating/validating COGs, documentation for which can be found here.

We create our files by building a profile from the destination datasets (sr_dst or vi_dst) and using the rio.to_raster() method. Each of the profile options is necessary for the format conversion, but can be changed to user preference as needed. For example, if you prefer a different nodata value, substitute the values you’d like to change in the dictionaries below.

sr_dst_name = Path(sr_path).with_suffix(".tif")
profile = {
    "driver": "COG",
    "width": sr_dst.shape[2],
    "height": sr_dst.shape[1],
    "count": sr_dst.shape[0],
    "crs": sr_dst.rio.crs,
    "dtype": sr_dst.dtype,
    "transform": sr_dst.rio.transform(),
    "compress": "lzw",
    "nodata": np.nan,
    "interleave":"BAND",
    "tiled":"YES",
    "blockxsize": "512", "blockysize": "512",
}
sr_dst.rio.to_raster(sr_dst_name, **profile)

vi_dst_name = Path(vi_path).with_suffix(".tif")
profile = {
    "driver": "COG",
    "width": vi_dst["cire"].shape[1],
    "height": vi_dst["cire"].shape[0],
    "count": 11,
    "crs": vi_dst.rio.crs,
    "dtype": vi_dst["cire"].dtype,
    "transform": vi_dst.rio.transform(),
    "compress": "lzw",
    "nodata": np.nan,
    "interleave":"BAND",
    "tiled":"YES",
    "blockxsize": "512", "blockysize": "512",
}
vi_dst.rio.to_raster(vi_dst_name, **profile)

The files should be successfully converted, and able to be analyzed properly in QGIS and other software! To make a nice quick true colour image in your program of choice, you can set R = 655 nm (band 60), G = 555 nm (band 42), and B = 470 nm (band 25).

To do the format conversion and reprojection in one step, please see the function below:

def nc_to_gtiff(fpath):
    """
    Convert a PACE SFREFL or LANDVI NetCDF file to GeoTIFF format
    Masks LANDVI dataset for clouds automatically
    Args:
        fpath - Path to NetCDF file to convert
    """
    dt = xr.open_datatree(fpath)
    fpath_s = str(fpath)

    if "SFREFL" in fpath_s:
        src = dt["geophysical_data"]["rhos"].transpose("wavelength_3d", ...)
    elif "LANDVI" in fpath_s:
        src = dt["geophysical_data"].to_dataset()
        if src["l2_flags"].cf.is_flag_variable:
            cloud_mask = ~(src["l2_flags"].cf == "CLDICE")
            src = src.where(cloud_mask)
    else:
        print(
            "File is neither the SFREFL nor LANDVI PACE suite, you'll have to adapt these methods yourself!"
        )
        return

    src.coords["longitude"] = dt["navigation_data"]["longitude"]
    src.coords["latitude"] = dt["navigation_data"]["latitude"]
    src = src.rio.set_spatial_dims("pixels_per_line", "number_of_lines")
    src = src.rio.write_crs("epsg:4326")

    dst = src.rio.reproject(
        dst_crs=src.rio.crs,
        src_geoloc_array=(src.coords["longitude"], src.coords["latitude"]),
        nodata=np.nan,
        resampling=Resampling.nearest,
    )

    if "SFREFL" in fpath_s:
        width, height, count = dst.shape[2], dst.shape[1], dst.shape[0]
        dtype = dst.dtype
    elif "LANDVI" in fpath_s:
        width, height, count = dst["cire"].shape[1], dst["cire"].shape[0], 11
        dtype = dst["cire"].dtype

    dst_name = Path(fpath).with_suffix(".tif")
    profile = {
        "driver": "COG",
        "width": width,
        "height": height,
        "count": count,
        "crs": dst.rio.crs,
        "dtype": dtype,
        "transform": dst.rio.transform(),
        "compress": "lzw",
        "nodata": np.nan,
        "interleave":"BAND",
        "tiled":"YES",
        "blockxsize": "512", "blockysize": "512",
    }

    dst.rio.to_raster(dst_name, **profile)

nc_to_gtiff(vi_path)

4. Converting Level-3 Data to GeoTIFF#

Level-3 Mapped (L3M) data is already mapped to a Plate Carrée projection–in other words, unless you want the data in another projection, you don’t need to reproject as we did for the L2 data above. In order to convert these files from NetCDF to GeoTIFF, all you need is to transpose the datasets as necessary and assign a CRS.

First, let’s download a Level-3 Global Mapped Surface Reflectance file.

results = earthaccess.search_data(
    short_name="PACE_OCI_L3M_LANDVI",
    granule_name="PACE_OCI.20240601_20240630.L3m.MO.LANDVI.V3_0.0p1deg.nc",
)
paths = earthaccess.download(results, local_path="data")

if "SFREFL" in str(paths[0]):
    ds = xr.open_dataset(paths[0]).rhos.transpose("wavelength", ...)
elif "LANDVI" in str(paths[0]):
    ds = xr.open_dataset(paths[0]).drop_vars("palette")

ds = ds.rio.write_crs("epsg:4326")
ds.rio.to_raster(Path(paths[0]).with_suffix(".tif"), driver="COG")

back to top

You have completed the notebook on reprojecting and format conversion of PACE OCI L2 data. We suggest looking at the notebook on “Machine Learning with Satellite Data” to explore some more advanced analysis methods.

Reprojecting and Formatting PACE OCI Data

Contents