PaddleOCR-VL GGUF models hallucinate on large blocks of text

by alex-dinh - opened Jan 13

Jan 13

When I input an image containing a block of text to the PaddleOCR-VL GGUF, the output contains a lot of hallucinations.

For example, I tried with using this as input

And the output is this:

One advantage of parameterizing policies according to the soft-max action preferences is that the action-value and the?? are the same as the??.

This does not happen with the vllm example found here: https://docs.vllm.ai/projects/recipes/en/latest/PaddlePaddle/PaddleOCR-VL.html#installing-vllm

I get this issue even when using the Q8 and Q16 GGUF models. Does anyone know how to address this issue?

Liyulingyue

Owner Jan 14

I only run it on my edge devices(just for some simple text recognition) and haven't tried processing such large amounts of text.
It might have caused some destructive accuracy interference when decoupling the original process...

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment