[DRAFT] fix: transformers 5.x compat (cache_position + kwargs naming)

#6

Summary

This PR fixes two issues that prevent dots.ocr from working with transformers>=5.0:

1. cache_position TypeError on generation

In transformers 5.x, cache_position is no longer maintained in the generation loop. The current code does cache_position[0] == 0 which crashes with TypeError: 'NoneType' object is not subscriptable.

Fix: Use a combined check compatible with both transformers 4.x and 5.x — fall back to past_key_values is None when cache_position is unavailable.

2. _validate_model_kwargs ValueError for processor outputs

forward() uses **loss_kwargs instead of **kwargs. Transformers 5.x validation only recognizes **kwargs/**model_kwargs as catch-all params, causing processor outputs like mm_token_type_ids to fail validation.

Fix: Rename **loss_kwargs to **kwargs (functionally identical).

Backward compatibility

Both fixes maintain full backward compatibility with transformers 4.x.

emanuelevivoli changed pull request title from fix: transformers 5.x compat (cache_position + kwargs naming) to [DRAFT] fix: transformers 5.x compat (cache_position + kwargs naming)

Same issues as dots.ocr PR/50

This PR has the same two issues reported on the dots.ocr PR:

  1. DotsVLProcessor.__init__ missing video_processor — causes TypeError with transformers 4.57+ (note: dots.mocr may not trigger this if its config doesn't declare video tokens, but the code pattern is the same)

  2. Predictions differ under transformers 5.x — model produces garbage output (single fullpage bbox) instead of multi-element layout JSON

Current workaround: Running dots.mocr evaluation with transformers 4.57.6.

Marking as draft until resolved. See dots.ocr PR/50 for detailed description.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment