Skip to content

Using Kerchunk to make NOAA’s National Water Model Dataset more accessible

Posted on:November 2, 2023 at 11:00 PM

Variation in Predicted Streamflow near Philadelphia

My blog post recapping our work around NOAA’s National Water Model (NWM) is finally out!

In this blog, we discuss our experience using Kerchunk to improve access times to short-range streamflow predictions generated by NOAA’s National Water Model Predictions Dataset, achieving a speedup of 4 times, using 16 times less memory.

Read the full article on the Element 84 blog here.

This is the second part of the work that I showcased last year at ESIP 2022 in Pittsburgh, when we were benchmarking techniques to better access the NWM’s Retrospective Dataset.

The essential value proposition of predictive data is different from retrospective: recency matters a lot, so an expensive and time-taking ingest step isn’t really an option. People need quick access to the latest data to make sense of the most up-to-date predictions, so they can make decisions regarding flood risk, river health, and much more.

The recommendation here is to use Kerchunk, a JSON index that allows for Zarr-like access to NetCDF data via Xarray. We composed a test scenario and benchmarked it using various different techniques, all open source here on GitHub, and show why Kerchunk is the best solution.

Optimizing access to public climate datasets is a key step in democratizing evidence and model based conversations around adaptation and mitigation. We need to get the best data to the best minds as efficiently as possible, so that we can all address the growing challenges around water and climate together.

This has been a fun and challenging project for the last two years that has taken me out of my comfort zone of user-facing web app development. Learning to think in terms of data structures, Jupyter notebooks, and cloud resource utilization will be incredibly beneficial in all my future work. And of course, the real treasure as always was the friends I made along the way.

Have you worked in this space and had similar challenges? Are you dealing with similar issues in your work? Please connect on Twitter or LinkedIn! I love making new connections and having conversations around climate and water and software.