---
license: apache-2.0
tags:
  - finance
  - settlement-fails
# add
model_type: lightgbm
library_name: lightgbm
---

## Settlement “Stress” Flagging with LightGBM

**Objective**  
Quickly flag days where a given CUSIP’s settlement fails are in the top‑10% of historic fail values, so ops can investigate and remediate before T+1.

**Data & Features**  
- **Raw inputs**: daily “fails‐to‐deliver” count (`QUANTITY (FAILS)`) and price  
- **Engineered signals** (all lagged or historical, no leakage):  
  - 1‑day lags: `qty_lag1`, `price_lag1`, `fail_value_lag1`  
  - Rolling stats per‑CUSIP: 7‑day mean/std of quantity, 30‑day mean/std of fail value  
  - Momentum: `qty_pct_change`, `price_pct_change`  
  - Cumulative counts: days since last fail, # of days with any fail, cum qty  
  - Event timing: `day_of_week`, `is_month_end`, `is_quarter_end`, `is_year_end`  
  - Text flags: `is_foreign`, `is_adr`, `is_etf`, `is_reit`  
  - Heavy‑tail transforms: `log_qty`, `log_val`, extreme spikes  

**Model**  
- **Algorithm**: LightGBM Classifier (handles missing values out‑of‑the‑box, extremely fast)  
- **Training**  
  - Split by date: train = all data before `2025‑01‑01`, test = after  
  - Positive class = fail_value > 90th percentile (train)  
  - Early‑stop on AUC & binary_error on the hold‑out  
  - Best iteration: ~20 boosting rounds  

**Performance on Test Set**  
- **Threshold** (train 90th pctile of `fail_value`): 445 122.29  
- **ROC‑AUC**: 1.000  
- **Precision**: 0.99  
- **Recall**: 1.00  
- **F1‑Score**: 1.00
  
<img src="confusion_matrix.png" alt="Confusion matrix – test set" width="35%">
     
*Figure 1 – Confusion matrix on the 2025-test slice.*

| Class | True Neg | False Pos | False Neg | True Pos |
|-------|---------:|----------:|----------:|---------:|
| Count | 279 712  | 226       | 49        | 30 991   |

**Top Features (gain)**  
| Feature             | Importance |
|---------------------|-----------:|
| `price_pct_change`  |        142 |
| `price_lag1`        |        129 |
| `log_qty`           |        115 |
| `qty_pct_change`    |         83 |
| `fail_value_lag1`   |         48 |
| `log_val`           |         40 |
| (…plus smaller contributions…)  |     |

**Next Steps**  
1.  **Calibrate** probability threshold for ops SLAs.  
2.  **Monitor** drift in AUC/precision‐recall over time.


### Quick start
```python
import joblib
model = joblib.load("lgb_settlement_stress_flag.pkl")
proba = model.predict_proba(X)[:, 1]    # P(stress)
flag  = proba > 0.5
```

## Citation

> Musodza, K. (2025). Bond Settlement Automated Exception Handling and Reconciliation. Zenodo. https://doi.org/10.5281/zenodo.16828730
> 
> ➡️  Technical white-paper & notebooks: https://github.com/Coreledger-tech/Exception-handling-reconciliation.git