--- license_name: license license_link: https://huggingface.co/meetween/Llama-speechlmm-1.0-l/blob/main/LICENSE language: - en - fr - it - de - es --- # Model Card for Model Llama-speechLMM_v1.0_L-ASR ## Model information This is the version of meetween/Llama-speechlmm-1.0-l that was fine-tuned for Automatic Speech Recognition. ## License: see https://huggingface.co/meetween/Llama-speechlmm-1.0-l/blob/main/LICENSE ## Model Architecture Identical to base model. This model does not include a video adapter. This model was obtained by fine-tuning the speech adapter. This repository contains the weights of the speech adapter together with the weights of the main model. ## How to use Identical to main model ## Training Data The model was fine tuned on the same data sets used for training the main model. Number of samples (hours): 500 (LibriHeavy) + 300 (LibriTTS) + 68 (AMI) + 68 (ICSI) + 780 (CoVoST2) + 264 (CommonVoice) = 1980 in total ## Evaluation results (%WER) | | ACL6060 (en) | Covost (it) | Covost (es) | Covost (fr) | Covost (de) | |------------------ |------------ |----------- | ----------- |------------ | ----------- | | Base model | 24.52 | 8.60 | 7.14 | 11.96 | 9.42 | | SpeechLMM_v1.0_L_FT | 16.52 | 5.65 | 5.17 | 9.44 | 6.57 ## Framework versions Transformers 4.45.0 Pytorch 2.3.1+cu124.post2 Datasets 3.2.0 Tokenizers 0.20.0 ## Compute Infrastructure: see https://www.cyfronet.pl/en/18377,artykul,plgrid_infrastructure.html