File size: 3,563 Bytes
87773e6 056f594 87773e6 056f594 1a0a011 056f594 1a0a011 056f594 87773e6 056f594 87773e6 056f594 87773e6 056f594 87773e6 056f594 87773e6 056f594 87773e6 056f594 87773e6 056f594 87773e6 056f594 1a0a011 87773e6 056f594 87773e6 056f594 87773e6 056f594 87773e6 056f594 87773e6 056f594 87773e6 056f594 87773e6 056f594 87773e6 056f594 87773e6 056f594 87773e6 056f594 87773e6 056f594 87773e6 056f594 87773e6 056f594 87773e6 056f594 87773e6 056f594 87773e6 056f594 7b17bf2 056f594 10a6816 056f594 87773e6 056f594 87773e6 056f594 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | ---
language:
- te
license: apache-2.0
library_name: transformers
pipeline_tag: automatic-speech-recognition
tags:
- whisper
- telugu
- asr
- speech-recognition
- indian-languages
- ai4bharat
base_model: openai/whisper-small
datasets:
- ai4bharat/Kathbath
metrics:
- wer
- cer
model-index:
- name: vanshnawander/whisper-small-telugu
results:
- task:
type: automatic-speech-recognition
name: Speech Recognition
dataset:
name: Shrutilipi (Telugu)
type: ai4bharat/Shrutilipi
metrics:
- type: wer
value: 69.7
name: Word Error Rate
- type: cer
value: 28.9
name: Character Error Rate
---
# vanshnawander/whisper-small-telugu
This is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) for Telugu automatic speech recognition (ASR).
## Model Description
- **Base Model:** [openai/whisper-small](https://huggingface.co/openai/whisper-small)
- **Language:** Telugu (te)
- **Task:** Automatic Speech Recognition (transcribe)
- **Training Data:** [ai4bharat/Kathbath](https://huggingface.co/datasets/ai4bharat/Kathbath)
- **Fine-tuning Framework:** Transformers + Custom DALI Pipeline
## Training Details
The model was fine-tuned on the Kathbath Telugu dataset with the following configuration:
- **Epochs:** 3
- **Batch Size:** 16 (effective ~96 with gradient accumulation)
- **Learning Rate:** 1e-5
- **Mixed Precision:** FP16
- **Gradient Checkpointing:** Enabled
## Evaluation Results
Evaluated on the [Shrutilipi benchmark](https://huggingface.co/datasets/ai4bharat/Shrutilipi) - a large-scale ASR dataset for Indian languages.
| Model | WER | CER | Improvement |
|-------|-----|-----|-------------|
| Base (openai/whisper-small) | N/A% | N/A% | - |
| **This Model** | **69.7%** | **28.9%** | |
## Usage
### Basic Usage
```python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa
# Load model and processor
processor = WhisperProcessor.from_pretrained("vanshnawander/whisper-small-telugu")
model = WhisperForConditionalGeneration.from_pretrained("vanshnawander/whisper-small-telugu")
# Load audio
audio, sr = librosa.load("audio.wav", sr=16000)
# Transcribe
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
generated_ids = model.generate(input_features, language="te", task="transcribe")
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(transcription)
```
### Using Pipeline
```python
from transformers import pipeline
pipe = pipeline(
"automatic-speech-recognition",
model="vanshnawander/whisper-small-telugu",
chunk_length_s=30,
)
result = pipe("audio.wav", generate_kwargs={"language": "te", "task": "transcribe"})
print(result["text"])
```
## Limitations
- Optimized for Telugu speech; may not perform well on other languages
- Best performance on clear audio with minimal background noise
- May struggle with very fast speech or heavy code-mixing
## Citation
If you use this model, please cite:
```bibtex
@misc{vanshnawander_whisper_small_telugu},
author = {Vansh Nawander},
title = {vanshnawander/whisper-small-telugu},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/vanshnawander/whisper-small-telugu}
}
```
## Acknowledgments
- [OpenAI Whisper](https://github.com/openai/whisper) for the base model
- [AI4Bharat](https://ai4bharat.iitm.ac.in/) for the Kathbath and Shrutilipi datasets
- [Hugging Face](https://huggingface.co/) for the transformers library
|