Instructions to use Zkkkai/CPGD-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Zkkkai/CPGD-7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Zkkkai/CPGD-7B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Zkkkai/CPGD-7B") model = AutoModelForImageTextToText.from_pretrained("Zkkkai/CPGD-7B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Zkkkai/CPGD-7B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Zkkkai/CPGD-7B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Zkkkai/CPGD-7B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Zkkkai/CPGD-7B
- SGLang
How to use Zkkkai/CPGD-7B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Zkkkai/CPGD-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Zkkkai/CPGD-7B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Zkkkai/CPGD-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Zkkkai/CPGD-7B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Zkkkai/CPGD-7B with Docker Model Runner:
docker model run hf.co/Zkkkai/CPGD-7B
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models
We proposed a novel RL algorithm called Clipped Policy Gradient Optimization with Policy Drift (CPGD), which is based on policy gradient loss with a clipping mechanism and a policy drift regularizer. In our experiments, we found that it is more stable and performs better than GRPO.
- 📖 Report: CPGD-Report, CPGD-arxiv
- 🤗 Model: MM-Eureka-CPGD-Qwen-7B
- 🚀Code: MM-Eureka-Qwen-Code
🤖 Models
Based on the key factors identified by https://github.com/ModalMinds/MM-EUREKA for achieving stable training, we enhanced the model, dataset, and algorithmic modules. Specifically, we maintained the strategy of omitting the KL divergence term and applying data filtering, while implementing the following critical modifications:
- The base model was upgraded from InternVL2.5-8B-Instruct to the more powerful Qwen2.5-VL-7B-Instruct.
- The Vision Transformer (ViT) module was frozen during training.
- The underlying RL algorithm was replaced with GRPO, instead of the previously used RLOO.
- The data filtering strategy was transitioned from an offline approach to an online approach.
- Additional data from the K12 dataset was collected, expanding the total dataset size to 15,000 samples.
| Model | MathVista | MathVerse | MathVision | OlympiadBench | WeMath | MMK12 |
|---|---|---|---|---|---|---|
| Claude3.7-Sonnet | 66.8 | 52.0 | 41.3 | 48.9 | 72.6 | 55.3 |
| GPT-4o | 63.8 | 50.2 | 30.4 | 35.0 | 68.8 | 49.9 |
| o1 | 73.9 | 57.0 | 60.3 | 68.0 | 98.7 | 73.9 |
| Gemini2-flash | 70.4 | 59.3 | 41.3 | 51.0 | 71.4 | 65.2 |
| Qwen-2.5-VL-7B | 68.2 | 47.9 | 25.4 | 20.2 | 62.1 | 53.6 |
| Qwen-2.5-VL-32B | 74.7/71.7 | 49.9 | 40.1 | 30.0 | 69.1 | 66.8 |
| Qwen-2.5-VL-72B | 74.8 | 57.6 | 38.1 | 40.4 | 72.4 | 70.5 |
| InternVL2.5-VL-78B | 72.3 | 51.7 | 32.2 | 31.1 | 66.3 | 61.6 |
| QVQ-72B-Preview | 71.4 | 48.2 | 35.9 | 33.2 | 65.4 | 61.5 |
| Adora-7B | 73.5 | 50.1 | 23.0 | 20.1 | 64.2 | 58.1 |
| R1-Onevision-7B | 64.1 | 47.1 | 29.9/23.5 | 17.3 | 61.8 | 39.8 |
| MM-Eureka-Qwen-7B | 73.0 | 50.3 | 26.9 | 20.1 | 66.1 | 64.5 |
| MM-Eureka-Qwen-32B | 74.8 | 56.5 | 34.4 | 35.9 | 73.4 | 72.2 |
| MM-Eureka-CPGD-Qwen-7B | 74.0 | 50.6 | 28.3 | 21.4 | 68.3 | 65.3 |
- Downloads last month
- 3