Instructions to use Zkkkai/CPGD-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Zkkkai/CPGD-7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Zkkkai/CPGD-7B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("Zkkkai/CPGD-7B")
model = AutoModelForImageTextToText.from_pretrained("Zkkkai/CPGD-7B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Zkkkai/CPGD-7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Zkkkai/CPGD-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Zkkkai/CPGD-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Zkkkai/CPGD-7B

SGLang

How to use Zkkkai/CPGD-7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Zkkkai/CPGD-7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Zkkkai/CPGD-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Zkkkai/CPGD-7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Zkkkai/CPGD-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Zkkkai/CPGD-7B with Docker Model Runner:
```
docker model run hf.co/Zkkkai/CPGD-7B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models

[CPGD arxiv Link]

We proposed a novel RL algorithm called Clipped Policy Gradient Optimization with Policy Drift (CPGD), which is based on policy gradient loss with a clipping mechanism and a policy drift regularizer. In our experiments, we found that it is more stable and performs better than GRPO.

🤖 Models

Based on the key factors identified by https://github.com/ModalMinds/MM-EUREKA for achieving stable training, we enhanced the model, dataset, and algorithmic modules. Specifically, we maintained the strategy of omitting the KL divergence term and applying data filtering, while implementing the following critical modifications:

The base model was upgraded from InternVL2.5-8B-Instruct to the more powerful Qwen2.5-VL-7B-Instruct.
The Vision Transformer (ViT) module was frozen during training.
The underlying RL algorithm was replaced with GRPO, instead of the previously used RLOO.
The data filtering strategy was transitioned from an offline approach to an online approach.
Additional data from the K12 dataset was collected, expanding the total dataset size to 15,000 samples.

Model	MathVista	MathVerse	MathVision	OlympiadBench	WeMath	MMK12
Claude3.7-Sonnet	66.8	52.0	41.3	48.9	72.6	55.3
GPT-4o	63.8	50.2	30.4	35.0	68.8	49.9
o1	73.9	57.0	60.3	68.0	98.7	73.9
Gemini2-flash	70.4	59.3	41.3	51.0	71.4	65.2
Qwen-2.5-VL-7B	68.2	47.9	25.4	20.2	62.1	53.6
Qwen-2.5-VL-32B	74.7/71.7	49.9	40.1	30.0	69.1	66.8
Qwen-2.5-VL-72B	74.8	57.6	38.1	40.4	72.4	70.5
InternVL2.5-VL-78B	72.3	51.7	32.2	31.1	66.3	61.6
QVQ-72B-Preview	71.4	48.2	35.9	33.2	65.4	61.5
Adora-7B	73.5	50.1	23.0	20.1	64.2	58.1
R1-Onevision-7B	64.1	47.1	29.9/23.5	17.3	61.8	39.8
MM-Eureka-Qwen-7B	73.0	50.3	26.9	20.1	66.1	64.5
MM-Eureka-Qwen-32B	74.8	56.5	34.4	35.9	73.4	72.2
MM-Eureka-CPGD-Qwen-7B	74.0	50.6	28.3	21.4	68.3	65.3

Downloads last month: 3

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for Zkkkai/CPGD-7B

Quantizations

2 models

Papers for Zkkkai/CPGD-7B

CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models

Paper • 2505.12504 • Published May 18, 2025 • 24

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 145