---
library_name: transformers
license: apache-2.0
base_model: sbartlett97/gqa-opus-mt-de-en
tags:
- generated_from_trainer
metrics:
- bleu
model-index:
- name: gqa-opus-mt-de-en
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# IMNPORTANT: This is an experimental model that uses a custom modified architecture for the MarianMT models. (It has also seen insufficient data to learn properly so will perform badly)

# gqa-opus-mt-de-en

This model is a fine-tuned version of [sbartlett97/gqa-opus-mt-de-en](https://huggingface.co/sbartlett97/gqa-opus-mt-de-en) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 2.6134
- Bleu: 0.2252
- Gen Len: 13.0925

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 25
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step    | Validation Loss | Bleu   | Gen Len |
|:-------------:|:-----:|:-------:|:---------------:|:------:|:-------:|
| 1.5403        | 1.0   | 62500   | 1.9670          | 0.2318 | 12.8925 |
| 1.5017        | 2.0   | 125000  | 1.9568          | 0.2285 | 12.8835 |
| 1.4314        | 3.0   | 187500  | 1.9667          | 0.2227 | 12.863  |
| 1.4056        | 4.0   | 250000  | 1.9781          | 0.2306 | 12.9815 |
| 1.3433        | 5.0   | 312500  | 2.0043          | 0.2312 | 12.937  |
| 1.3228        | 6.0   | 375000  | 2.0210          | 0.2283 | 12.9465 |
| 1.2224        | 7.0   | 437500  | 2.0480          | 0.2284 | 12.9285 |
| 1.2103        | 8.0   | 500000  | 2.0813          | 0.2305 | 12.901  |
| 1.1647        | 9.0   | 562500  | 2.1105          | 0.2288 | 12.93   |
| 1.1148        | 10.0  | 625000  | 2.1348          | 0.2289 | 12.989  |
| 1.1014        | 11.0  | 687500  | 2.1448          | 0.2282 | 12.924  |
| 1.0690        | 12.0  | 750000  | 2.1797          | 0.2258 | 12.9675 |
| 0.9994        | 13.0  | 812500  | 2.2147          | 0.2272 | 12.996  |
| 0.9748        | 14.0  | 875000  | 2.2385          | 0.2251 | 12.9535 |
| 0.9504        | 15.0  | 937500  | 2.2863          | 0.224  | 12.975  |
| 0.9147        | 16.0  | 1000000 | 2.3162          | 0.2241 | 12.9705 |
| 0.8565        | 17.0  | 1062500 | 2.3561          | 0.2272 | 13.012  |
| 0.8204        | 18.0  | 1125000 | 2.3846          | 0.2273 | 13.0055 |
| 0.7785        | 19.0  | 1187500 | 2.4334          | 0.2217 | 12.991  |
| 0.7603        | 20.0  | 1250000 | 2.4639          | 0.2237 | 13.0475 |
| 0.7153        | 21.0  | 1312500 | 2.5014          | 0.2213 | 13.051  |
| 0.6761        | 22.0  | 1375000 | 2.5300          | 0.2216 | 13.0385 |
| 0.6352        | 23.0  | 1437500 | 2.5624          | 0.2219 | 13.078  |
| 0.6079        | 24.0  | 1500000 | 2.5957          | 0.2245 | 13.087  |
| 0.5723        | 25.0  | 1562500 | 2.6134          | 0.2252 | 13.0925 |


### Framework versions

- Transformers 5.0.0.dev0
- Pytorch 2.9.1+cu128
- Datasets 4.4.1
- Tokenizers 0.22.1