---
license_name: license
license_link: https://huggingface.co/meetween/Llama-speechlmm-1.0-l/blob/main/LICENSE
language:
- en
- fr
- it
- de
- es
---
# Model Card for Model Llama-speechLMM_v1.0_L-ASR

<!-- Provide a quick summary of what the model is/does. -->

## Model information

This is the version of meetween/Llama-speechlmm-1.0-l that was fine-tuned for Automatic Speech Recognition.

## License: see https://huggingface.co/meetween/Llama-speechlmm-1.0-l/blob/main/LICENSE


## Model Architecture

Identical to base model. This model does not include a video adapter.

This model was obtained by fine-tuning the speech adapter. This repository contains the weights of the 
speech adapter together with the weights of the main model. 

## How to use

Identical to main model

## Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
The model was fine tuned on the same data sets used for training 
the main model. 

Number of samples (hours):
500 (LibriHeavy) + 
300 (LibriTTS) + 
68 (AMI) + 
68 (ICSI) + 
780 (CoVoST2) + 
264 (CommonVoice) 

= 1980 in total

## Evaluation results (%WER)

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

|                     | ACL6060 (en) | Covost (it) | Covost (es) | Covost (fr) | Covost (de) |
|------------------   |------------  |-----------  | ----------- |------------ | ----------- |
| Base model          |   24.52      |   8.60      | 7.14        | 11.96       | 9.42        |
| SpeechLMM_v1.0_L_FT |   16.52      |   5.65      | 5.17        | 9.44        | 6.57


## Framework versions

Transformers 4.45.0

Pytorch 2.3.1+cu124.post2

Datasets 3.2.0

Tokenizers 0.20.0

## Compute Infrastructure: see  https://www.cyfronet.pl/en/18377,artykul,plgrid_infrastructure.html