Model Quantization Notebook

This notebook converts a pre-trained Keras violence detection model into TensorFlow Lite (TFLite) format using three different quantization strategies, making it suitable for deployment on edge/mobile devices.

Overview

Property	Details
Framework	TensorFlow / TFLite
Base Model	`modelv2.keras` — a Keras video violence detection model
Input Shape	`(1, 16, 224, 224, 3)` — batch × frames × height × width × channels
Architecture	CNN + LSTM (contains dynamic LSTM loops)
Platform	Kaggle (GPU hidden to avoid CuDNN conflicts)

Quantization Methods

A — Dynamic Range Quantization

Output file: model_dynamic_quant.tflite
Quantizes weights from float32 to int8 at conversion time.
Activations are quantized dynamically at inference time.
Fastest to convert; no calibration data required.
Good balance between size reduction and accuracy.

B — Float16 Quantization

Output file: model_fp16_quant.tflite
Reduces weight precision from float32 to float16.
Ideal for GPU-accelerated edge devices that support fp16 natively.
Smaller model size with minimal accuracy loss.

C — Full Integer (INT8) Quantization

Output file: model_full_int8.tflite
Quantizes both weights and activations to int8.
Requires a representative dataset for calibration (currently uses random dummy data — replace with real video samples for best results).
Input and output tensors are also forced to int8.
Smallest model size; best suited for CPU-only or microcontroller deployment.

Requirements

tensorflow
numpy

Usage

1. Load the Base Model

import tensorflow as tf

tf.config.set_visible_devices([], 'GPU')  # Hide GPU to avoid CuDNN issues
model = tf.keras.models.load_model('path/to/modelv2.keras')

2. Run Quantization

Open and run the notebook cells in order:

Cell 1–2 — Load the model
Cell 3–4 — Dynamic range quantization → model_dynamic_quant.tflite
Cell 5–6 — Float16 quantization → model_fp16_quant.tflite
Cell 7–8 — Full INT8 quantization → model_full_int8.tflite

Important Notes

Representative dataset: The INT8 quantization cell uses random dummy data for calibration. For production use, replace dummy_data in representative_data_gen() with real video frames from your training set to get accurate quantization ranges.

LSTM compatibility flags: The model contains dynamic LSTM loops. The following flags are set in all conversion paths to prevent conversion failures:

converter.target_spec.supported_ops = [
    tf.lite.OpsSet.TFLITE_BUILTINS,
    tf.lite.OpsSet.SELECT_TF_OPS
]
converter._experimental_lower_tensor_list_ops = False

Static input shape: The INT8 path uses tf.function with a tf.TensorSpec to lock the input shape to (1, 16, 224, 224, 3) before conversion — this is required for correct INT8 LSTM quantization.

Output Files

File	Method	Precision
`model_dynamic_quant.tflite`	Dynamic Range	Weights: INT8, Activations: float32
`model_fp16_quant.tflite`	Float16	Weights & Activations: float16
`model_full_int8.tflite`	Full Integer	Weights & Activations: INT8

Downloads last month: 34

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support