Instructions to use trysem/conformer-ml with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use trysem/conformer-ml with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("trysem/conformer-ml") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
| license: mit | |
| language: | |
| - ml | |
| pipeline_tag: automatic-speech-recognition | |
| library_name: nemo | |
| ## IndicConformer | |
| IndicConformer is a Hybrid CTC-RNNT conformer ASR(Automatic Speech Recognition) model. | |
| ### Language | |
| Malayalam | |
| ### Input | |
| This model accepts 16000 KHz Mono-channel Audio (wav files) as input. | |
| ### Output | |
| This model provides transcribed speech as a string for a given audio sample. | |
| ## Model Architecture | |
| This model is a conformer-Large model, consisting of 120M parameters, as the encoder, with a hybrid CTC-RNNT decoder. The model has 17 conformer blocks with | |
| 512 as the model dimension. | |
| ## AI4Bharat NeMo: | |
| To load, train, fine-tune or play with the model you will need to install [AI4Bharat NeMo](https://github.com/AI4Bharat/NeMo). We recommend you install it using the command shown below | |
| ``` | |
| git clone https://github.com/AI4Bharat/NeMo.git && cd NeMo && git checkout nemo-v2 && bash reinstall.sh | |
| ``` | |
| ## Usage | |
| Download and load the model from Huggingface. | |
| ``` | |
| import torch | |
| import nemo.collections.asr as nemo_asr | |
| model = nemo_asr.models.ASRModel.from_pretrained("ai4bharat/indicconformer_stt_ml_hybrid_rnnt_large") | |
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | |
| model.freeze() # inference mode | |
| model = model.to(device) # transfer model to device | |
| ``` | |
| Get an audio file ready by running the command shown below in your terminal. This will convert the audio to 16000 Hz and monochannel. | |
| ``` | |
| ffmpeg -i sample_audio.wav -ac 1 -ar 16000 sample_audio_infer_ready.wav | |
| ``` | |
| ### Inference using CTC decoder | |
| ``` | |
| model.cur_decoder = "ctc" | |
| ctc_text = model.transcribe(['sample_audio_infer_ready.wav'], batch_size=1,logprobs=False, language_id='ml')[0] | |
| print(ctc_text) | |
| ``` | |
| ### Inference using RNNT decoder | |
| ``` | |
| model.cur_decoder = "rnnt" | |
| rnnt_text = model.transcribe(['sample_audio_infer_ready.wav'], batch_size=1, language_id='ml')[0] | |
| print(rnnt_text) | |
| ``` | |