Model Quantization Notebook

This notebook converts a pre-trained Keras violence detection model into TensorFlow Lite (TFLite) format using three different quantization strategies, making it suitable for deployment on edge/mobile devices.


Overview

Property Details
Framework TensorFlow / TFLite
Base Model modelv2.keras β€” a Keras video violence detection model
Input Shape (1, 16, 224, 224, 3) β€” batch Γ— frames Γ— height Γ— width Γ— channels
Architecture CNN + LSTM (contains dynamic LSTM loops)
Platform Kaggle (GPU hidden to avoid CuDNN conflicts)

Quantization Methods

A β€” Dynamic Range Quantization

  • Output file: model_dynamic_quant.tflite
  • Quantizes weights from float32 to int8 at conversion time.
  • Activations are quantized dynamically at inference time.
  • Fastest to convert; no calibration data required.
  • Good balance between size reduction and accuracy.

B β€” Float16 Quantization

  • Output file: model_fp16_quant.tflite
  • Reduces weight precision from float32 to float16.
  • Ideal for GPU-accelerated edge devices that support fp16 natively.
  • Smaller model size with minimal accuracy loss.

C β€” Full Integer (INT8) Quantization

  • Output file: model_full_int8.tflite
  • Quantizes both weights and activations to int8.
  • Requires a representative dataset for calibration (currently uses random dummy data β€” replace with real video samples for best results).
  • Input and output tensors are also forced to int8.
  • Smallest model size; best suited for CPU-only or microcontroller deployment.

Requirements

tensorflow
numpy

Usage

1. Load the Base Model

import tensorflow as tf

tf.config.set_visible_devices([], 'GPU')  # Hide GPU to avoid CuDNN issues
model = tf.keras.models.load_model('path/to/modelv2.keras')

2. Run Quantization

Open and run the notebook cells in order:

  1. Cell 1–2 β€” Load the model
  2. Cell 3–4 β€” Dynamic range quantization β†’ model_dynamic_quant.tflite
  3. Cell 5–6 β€” Float16 quantization β†’ model_fp16_quant.tflite
  4. Cell 7–8 β€” Full INT8 quantization β†’ model_full_int8.tflite

Important Notes

  • Representative dataset: The INT8 quantization cell uses random dummy data for calibration. For production use, replace dummy_data in representative_data_gen() with real video frames from your training set to get accurate quantization ranges.

  • LSTM compatibility flags: The model contains dynamic LSTM loops. The following flags are set in all conversion paths to prevent conversion failures:

    converter.target_spec.supported_ops = [
        tf.lite.OpsSet.TFLITE_BUILTINS,
        tf.lite.OpsSet.SELECT_TF_OPS
    ]
    converter._experimental_lower_tensor_list_ops = False
    
  • Static input shape: The INT8 path uses tf.function with a tf.TensorSpec to lock the input shape to (1, 16, 224, 224, 3) before conversion β€” this is required for correct INT8 LSTM quantization.


Output Files

File Method Precision
model_dynamic_quant.tflite Dynamic Range Weights: INT8, Activations: float32
model_fp16_quant.tflite Float16 Weights & Activations: float16
model_full_int8.tflite Full Integer Weights & Activations: INT8
Downloads last month
34
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support