File size: 3,563 Bytes
87773e6
056f594
 
 
87773e6
056f594
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1a0a011
056f594
 
1a0a011
056f594
87773e6
 
056f594
87773e6
056f594
87773e6
056f594
87773e6
056f594
 
 
 
 
87773e6
 
 
056f594
 
 
 
 
 
87773e6
056f594
87773e6
056f594
87773e6
056f594
 
 
1a0a011
87773e6
056f594
87773e6
056f594
87773e6
056f594
 
 
87773e6
056f594
 
 
87773e6
056f594
 
87773e6
056f594
 
 
 
87773e6
056f594
 
87773e6
056f594
87773e6
056f594
 
87773e6
056f594
 
 
 
 
87773e6
056f594
 
 
87773e6
056f594
87773e6
056f594
 
 
87773e6
056f594
87773e6
056f594
87773e6
056f594
 
7b17bf2
056f594
10a6816
056f594
 
 
 
87773e6
056f594
87773e6
056f594
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
language:
- te
license: apache-2.0
library_name: transformers
pipeline_tag: automatic-speech-recognition
tags:
- whisper
- telugu
- asr
- speech-recognition
- indian-languages
- ai4bharat
base_model: openai/whisper-small
datasets:
- ai4bharat/Kathbath
metrics:
- wer
- cer
model-index:
- name: vanshnawander/whisper-small-telugu
  results:
  - task:
      type: automatic-speech-recognition
      name: Speech Recognition
    dataset:
      name: Shrutilipi (Telugu)
      type: ai4bharat/Shrutilipi
    metrics:
    - type: wer
      value: 69.7
      name: Word Error Rate
    - type: cer
      value: 28.9
      name: Character Error Rate
---

# vanshnawander/whisper-small-telugu

This is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) for Telugu automatic speech recognition (ASR).

## Model Description

- **Base Model:** [openai/whisper-small](https://huggingface.co/openai/whisper-small)
- **Language:** Telugu (te)
- **Task:** Automatic Speech Recognition (transcribe)
- **Training Data:** [ai4bharat/Kathbath](https://huggingface.co/datasets/ai4bharat/Kathbath)
- **Fine-tuning Framework:** Transformers + Custom DALI Pipeline

## Training Details

The model was fine-tuned on the Kathbath Telugu dataset with the following configuration:
- **Epochs:** 3
- **Batch Size:** 16 (effective ~96 with gradient accumulation)
- **Learning Rate:** 1e-5
- **Mixed Precision:** FP16
- **Gradient Checkpointing:** Enabled

## Evaluation Results

Evaluated on the [Shrutilipi benchmark](https://huggingface.co/datasets/ai4bharat/Shrutilipi) - a large-scale ASR dataset for Indian languages.

| Model | WER | CER | Improvement |
|-------|-----|-----|-------------|
| Base (openai/whisper-small) | N/A% | N/A% | - |
| **This Model** | **69.7%** | **28.9%** |  |

## Usage

### Basic Usage

```python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa

# Load model and processor
processor = WhisperProcessor.from_pretrained("vanshnawander/whisper-small-telugu")
model = WhisperForConditionalGeneration.from_pretrained("vanshnawander/whisper-small-telugu")

# Load audio
audio, sr = librosa.load("audio.wav", sr=16000)

# Transcribe
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
generated_ids = model.generate(input_features, language="te", task="transcribe")
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(transcription)
```

### Using Pipeline

```python
from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="vanshnawander/whisper-small-telugu",
    chunk_length_s=30,
)

result = pipe("audio.wav", generate_kwargs={"language": "te", "task": "transcribe"})
print(result["text"])
```

## Limitations

- Optimized for Telugu speech; may not perform well on other languages
- Best performance on clear audio with minimal background noise
- May struggle with very fast speech or heavy code-mixing

## Citation

If you use this model, please cite:

```bibtex
@misc{vanshnawander_whisper_small_telugu},
  author = {Vansh Nawander},
  title = {vanshnawander/whisper-small-telugu},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/vanshnawander/whisper-small-telugu}
}
```

## Acknowledgments

- [OpenAI Whisper](https://github.com/openai/whisper) for the base model
- [AI4Bharat](https://ai4bharat.iitm.ac.in/) for the Kathbath and Shrutilipi datasets
- [Hugging Face](https://huggingface.co/) for the transformers library