--- library_name: transformers license: apache-2.0 base_model: sbartlett97/gqa-opus-mt-de-en tags: - generated_from_trainer metrics: - bleu model-index: - name: gqa-opus-mt-de-en results: [] --- # IMNPORTANT: This is an experimental model that uses a custom modified architecture for the MarianMT models. (It has also seen insufficient data to learn properly so will perform badly) # gqa-opus-mt-de-en This model is a fine-tuned version of [sbartlett97/gqa-opus-mt-de-en](https://huggingface.co/sbartlett97/gqa-opus-mt-de-en) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 2.6134 - Bleu: 0.2252 - Gen Len: 13.0925 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 500 - num_epochs: 25 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len | |:-------------:|:-----:|:-------:|:---------------:|:------:|:-------:| | 1.5403 | 1.0 | 62500 | 1.9670 | 0.2318 | 12.8925 | | 1.5017 | 2.0 | 125000 | 1.9568 | 0.2285 | 12.8835 | | 1.4314 | 3.0 | 187500 | 1.9667 | 0.2227 | 12.863 | | 1.4056 | 4.0 | 250000 | 1.9781 | 0.2306 | 12.9815 | | 1.3433 | 5.0 | 312500 | 2.0043 | 0.2312 | 12.937 | | 1.3228 | 6.0 | 375000 | 2.0210 | 0.2283 | 12.9465 | | 1.2224 | 7.0 | 437500 | 2.0480 | 0.2284 | 12.9285 | | 1.2103 | 8.0 | 500000 | 2.0813 | 0.2305 | 12.901 | | 1.1647 | 9.0 | 562500 | 2.1105 | 0.2288 | 12.93 | | 1.1148 | 10.0 | 625000 | 2.1348 | 0.2289 | 12.989 | | 1.1014 | 11.0 | 687500 | 2.1448 | 0.2282 | 12.924 | | 1.0690 | 12.0 | 750000 | 2.1797 | 0.2258 | 12.9675 | | 0.9994 | 13.0 | 812500 | 2.2147 | 0.2272 | 12.996 | | 0.9748 | 14.0 | 875000 | 2.2385 | 0.2251 | 12.9535 | | 0.9504 | 15.0 | 937500 | 2.2863 | 0.224 | 12.975 | | 0.9147 | 16.0 | 1000000 | 2.3162 | 0.2241 | 12.9705 | | 0.8565 | 17.0 | 1062500 | 2.3561 | 0.2272 | 13.012 | | 0.8204 | 18.0 | 1125000 | 2.3846 | 0.2273 | 13.0055 | | 0.7785 | 19.0 | 1187500 | 2.4334 | 0.2217 | 12.991 | | 0.7603 | 20.0 | 1250000 | 2.4639 | 0.2237 | 13.0475 | | 0.7153 | 21.0 | 1312500 | 2.5014 | 0.2213 | 13.051 | | 0.6761 | 22.0 | 1375000 | 2.5300 | 0.2216 | 13.0385 | | 0.6352 | 23.0 | 1437500 | 2.5624 | 0.2219 | 13.078 | | 0.6079 | 24.0 | 1500000 | 2.5957 | 0.2245 | 13.087 | | 0.5723 | 25.0 | 1562500 | 2.6134 | 0.2252 | 13.0925 | ### Framework versions - Transformers 5.0.0.dev0 - Pytorch 2.9.1+cu128 - Datasets 4.4.1 - Tokenizers 0.22.1