mazafard's picture
Update README.md
3509db0 verified
metadata
language: pt
license: mit
tags:
  - ocr
  - optical-character-recognition
  - text-recognition
  - trocr
  - vision
  - vision-and-language
datasets:
  - mazafard/portugues_ocr_dataset_full
model-index:
  - name: trocr-finetuned-portugues
    results:
      - task:
          type: optical-character-recognition
          name: Optical Character Recognition
        dataset:
          type: mazafard/portugues_ocr_dataset_full
          name: portugues_ocr_dataset_full
          args: default
        metrics:
          - type: cer
            value: 0.01
            name: Character Error Rate
          - type: wer
            value: 0.05
            name: Word Error Rate
base_model:
  - microsoft/trocr-base-printed
new_version: mazafard/trocr-finetuned_20250422_125947

TrOCR Fine-tuned for Portuguese

This model is a fine-tuned version of the microsoft/trocr-base-printed model for Optical Character Recognition (OCR) in Portuguese. It has been trained on the mazafard/portugues_ocr_dataset_full dataset, which contains images of Portuguese text and their corresponding transcriptions.

Model Description

  • Architecture: TrOCR (Transformer-based Optical Character Recognition)
  • Base Model: microsoft/trocr-base-printed
  • Training Data: mazafard/portugues_ocr_dataset_full
  • Language: Portuguese (pt)

Intended Uses & Limitations

This model is intended for OCR tasks on printed Portuguese text. It may not perform well on handwritten text or text in other languages. While the model has been fine-tuned and shows promising results, it's important to be aware that OCR models can still make errors, especially on complex or low-quality images.

Training and Evaluation Data

The model was trained on the mazafard/portugues_ocr_dataset_full dataset. This dataset includes images and labels specific to Portuguese. The images were preprocessed and augmented to enhance the model's performance and generalization capabilities.

The model was evaluated on a held-out portion of this same dataset, achieving the following results:

  • Character Error Rate (CER): 0.01
  • Word Error Rate (WER): 0.05 (This might vary, needs to be updated)

How to Use

python from transformers import VisionEncoderDecoderModel, TrOCRProcessor, pipeline

Load the model and processor
processor = TrOCRProcessor.from_pretrained("mazafard/trocr-finetuned-portugues") model = VisionEncoderDecoderModel.from_pretrained("mazafard/trocr-finetuned-portugues")

Create an OCR pipeline
ocr_pipeline = pipeline("image-to-text", model=model, processor=processor)

Perform OCR on an image
image_path = "path/to/your/image.jpg" predicted_text = ocr_pipeline(image_path)

print(predicted_text)

Limitations and Biases

The model's performance may be affected by factors such as image quality, font type, and text layout. It is important to evaluate the model's performance on your specific use case and dataset. Like any machine learning model, this model may also have biases inherited from the training data.

Further Information

For more details about the TrOCR architecture and the base model, please refer to the original model card: microsoft/trocr-base-printed