HydroGEM
HydroGEM is a self supervised foundation model for continental scale streamflow quality control. It produces per timestep anomaly probabilities and deploy safe suggested reconstructions for discharge and stage time series, intended for human in the loop review.
Status and citation
We are preparing the journal manuscript for submission to Environmental Modelling and Software (EMS).
Until the journal version is available, please cite the preprint:
HydroGEM preprint (arXiv): https://arxiv.org/abs/2512.14106
What is in this repository
This Hugging Face repository provides minimal inference artifacts for reproducibility and evaluation.
Inference model
hydrogem_inference.ptHydroGEM inference checkpoint
Notebooks
HydroGEM_Inference_ECCC_Tutorial.ipynbECCC zero shot inference tutorialHydroGEM_USGS Real Data_With SyntheticAnomalies_Benchmark_Tutorial_*.ipynbUSGS synthetic benchmark inference tutorial
Mini data and metadata
test_synthetic_mini.pklsmall test set used by the notebookssite_inventory_mini.jsonsmall site metadata used by the notebooks
Interactive result viewers
HydroGEM_ECCC_ZeroShot_Results.htmlinteractive ECCC results viewerUSGS_*_Results.htmlinteractive USGS results viewers, including synthetic anomaly detection and showcase pagesHydroGEM Synthetic Anomaly Dashboard.htmlsynthetic injected anomaly visualization dashboard with single segment examples, geographic context, duration diversity, and multiple plots per example
Documentation
USGS_synthetic_test_set_documentation.pdfdescribes how the USGS synthetic benchmark test set was constructed and evaluatedECCC_sites_and data labelling.pdfdocuments the ECCC sites and the weak label construction used for the manuscript zero shot evaluation
Training code and full scale data pipelines are not included.
Synthetic anomaly dashboard
The file HydroGEM Synthetic Anomaly Dashboard.html is an interactive viewer created to validate and communicate the synthetic injected anomaly test set. It includes true single segment examples per anomaly type and equation form, paired clean and corrupted signals, and compact diagnostics such as time series overlays, rating impact plots, normalized residuals, and pattern specific panels. If you want to see more examples beyond a single figure in the paper, open the HTML dashboard.
Quickstart
Choose 1 notebook and run all cells:
- ECCC:
HydroGEM_Inference_ECCC_Tutorial.ipynb - USGS:
HydroGEM_USGS Real Data_With SyntheticAnomalies_Benchmark_Tutorial_*.ipynb
Each notebook loads hydrogem_inference.pt and runs inference on the provided mini dataset.
Inputs and outputs
Inputs are paired discharge and stage time series in physical units within the notebooks, transformed internally into the model normalized space as described in the paper.
Outputs include:
- anomaly probability per timestep
- binary detection mask using the notebook threshold
- suggested reconstruction for discharge and stage in the same units shown in the notebook
All suggested reconstructions require expert review before any operational use.
License
This repository is released under CC BY NC 4.0 for research and non-commercial use.
For deployment, integration, redistribution, or other licensing requests, please contact: [email protected]
Contact
Ijaz Ul Haq
PhD in Computer Science, University of Vermont
Senior Research Analyst, Water Resources Institute, University of Vermont
Email: [email protected]