HydroGEM

HydroGEM is a self supervised foundation model for continental scale streamflow quality control. It produces per timestep anomaly probabilities and deploy safe suggested reconstructions for discharge and stage time series, intended for human in the loop review.

Status and citation

We are preparing the journal manuscript for submission to Environmental Modelling and Software (EMS).
Until the journal version is available, please cite the preprint:

HydroGEM preprint (arXiv): https://arxiv.org/abs/2512.14106

What is in this repository

This Hugging Face repository provides minimal inference artifacts for reproducibility and evaluation.

Inference model

  • hydrogem_inference.pt HydroGEM inference checkpoint

Notebooks

  • HydroGEM_Inference_ECCC_Tutorial.ipynb ECCC zero shot inference tutorial
  • HydroGEM_USGS Real Data_With SyntheticAnomalies_Benchmark_Tutorial_*.ipynb USGS synthetic benchmark inference tutorial

Mini data and metadata

  • test_synthetic_mini.pkl small test set used by the notebooks
  • site_inventory_mini.json small site metadata used by the notebooks

Interactive result viewers

  • HydroGEM_ECCC_ZeroShot_Results.html interactive ECCC results viewer
  • USGS_*_Results.html interactive USGS results viewers, including synthetic anomaly detection and showcase pages
  • HydroGEM Synthetic Anomaly Dashboard.html synthetic injected anomaly visualization dashboard with single segment examples, geographic context, duration diversity, and multiple plots per example

Documentation

  • USGS_synthetic_test_set_documentation.pdf describes how the USGS synthetic benchmark test set was constructed and evaluated
  • ECCC_sites_and data labelling.pdf documents the ECCC sites and the weak label construction used for the manuscript zero shot evaluation

Training code and full scale data pipelines are not included.

Synthetic anomaly dashboard

The file HydroGEM Synthetic Anomaly Dashboard.html is an interactive viewer created to validate and communicate the synthetic injected anomaly test set. It includes true single segment examples per anomaly type and equation form, paired clean and corrupted signals, and compact diagnostics such as time series overlays, rating impact plots, normalized residuals, and pattern specific panels. If you want to see more examples beyond a single figure in the paper, open the HTML dashboard.

Quickstart

Choose 1 notebook and run all cells:

  • ECCC: HydroGEM_Inference_ECCC_Tutorial.ipynb
  • USGS: HydroGEM_USGS Real Data_With SyntheticAnomalies_Benchmark_Tutorial_*.ipynb

Each notebook loads hydrogem_inference.pt and runs inference on the provided mini dataset.

Inputs and outputs

Inputs are paired discharge and stage time series in physical units within the notebooks, transformed internally into the model normalized space as described in the paper.

Outputs include:

  • anomaly probability per timestep
  • binary detection mask using the notebook threshold
  • suggested reconstruction for discharge and stage in the same units shown in the notebook

All suggested reconstructions require expert review before any operational use.

License

This repository is released under CC BY NC 4.0 for research and non-commercial use.
For deployment, integration, redistribution, or other licensing requests, please contact: [email protected]

Contact

Ijaz Ul Haq
PhD in Computer Science, University of Vermont
Senior Research Analyst, Water Resources Institute, University of Vermont
Email: [email protected]

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Ejokhan/HydroGEM