☁️ FDR4VGT-CLOUD
Official model release accompanying the paper:
A multisensor deep learning framework for robust cloud segmentation in SPOT-VGT and Proba-V Julio Contreras, Cesar Aybar, Luis Gómez-Chova IEEE Geoscience and Remote Sensing Letters, 2026.
This model is the operational cloud masking algorithm selected for the ESA FDR4VGT archive reprocessing, delivering consistent cloud detection across the full SPOT-VGT (VGT1 1998–2003, VGT2 2002–2014) and Proba-V (2013–2020) record — a single sensor-agnostic model for the three missions.
✨ Overview
- Architecture: Hybrid DeepLabV3+ (MobileNetV2 backbone) + PixelWise MLP (
PW-DL3+) - Input: 4 Top-of-Atmosphere reflectance bands (Blue, Red, NIR, SWIR) — sensor-agnostic
- Supported sensors: SPOT-VGT1, SPOT-VGT2, Proba-V
- Input shape:
[B, 4, 512, 512] - Parameters: 12.65M (57.29 MB)
- Training: Weak-to-strong supervision — large-scale pre-training on 3,647 weakly-labeled scenes, followed by fine-tuning on 109 hand-annotated hard-example scenes.
🚀 Quick start
Installation
pip install mlstac rasterio torch==2.5.1
Inference
import torch
import mlstac
import rasterio as rio
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# 1. Load the model
framework = mlstac.download(
file="https://huggingface.co/isp-uv-es/FDR4VGT-CLOUD/resolve/main/single/multisensor_single_1dpwdeeplabv3.json",
output_dir="FDR4VGT/single",
)
model = framework.model
# 2. Load a 4-band image (Blue, Red, NIR, SWIR)
with rio.open("https://huggingface.co/isp-uv-es/FDR4VGT-CLOUD/resolve/main/ensemble/rgb.tif") as src:
image = src.read()
# 3. Run large-scene inference (sliding window + Hann blending)
prob = framework.predict_large(
image=image,
model=model,
device=device,
batch_size=8, # increase on GPU to speed up; lower on CPU
num_workers=8,
nodata=0, # pixel value treated as invalid/padding
)
# 4. Binarize with the operational threshold
cloud_mask = (prob.squeeze() > 0.5).astype("uint8")
The binarization threshold (default
0.5) can be tuned per use case; the paper uses the F₂-optimal threshold on the validation set.
📊 Performance
Results on the manually-annotated test set (PW-DL3+, Multi-FT strategy) — mean over scenes:
| Sensor | F₂ | IoU | κ |
|---|---|---|---|
| Proba-V | 0.891 | 0.842 | 0.808 |
| SPOT-VGT | 0.949 | 0.898 | 0.829 |
The model substantially outperforms the legacy BS1 (physical thresholds) and BS2 (pixel-wise MLP) baselines on both sensors, with the largest gain on SPOT-VGT (ΔF₂ = +0.090 over BS1). Temporal analysis across the 1998–2020 archive shows no statistically significant discontinuity at the VGT→Proba-V transition (Mann-Whitney U, p > 0.05), in contrast to the legacy record.
📁 Repository layout
| Path | Description |
|---|---|
single/multisensor_single_1dpwdeeplabv3.json |
Operational single-model weights (PW-DL3+) |
ensemble/rgb.tif |
Example test scene (4-band TOA reflectance) |
📄 Citation
If you use this model, please cite:
@article{contreras2026fdr4vgt,
title = {A multisensor deep learning framework for robust cloud segmentation in SPOT-VGT and Proba-V},
author = {Contreras, Julio and Aybar, Cesar and G{\'o}mez-Chova, Luis},
journal = {IEEE Geoscience and Remote Sensing Letters},
year = {2026},
}
🙏 Acknowledgements
This work was supported by the European Space Agency (ESA) within the FDR4VGT: Fundamental Data Record for VGT project.
Developed at the Image Processing Laboratory (IPL), University of Valencia.