Instructions to use OpenGVLab/InternVL2-40B-AWQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use OpenGVLab/InternVL2-40B-AWQ with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="OpenGVLab/InternVL2-40B-AWQ", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("OpenGVLab/InternVL2-40B-AWQ", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use OpenGVLab/InternVL2-40B-AWQ with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "OpenGVLab/InternVL2-40B-AWQ"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenGVLab/InternVL2-40B-AWQ",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/OpenGVLab/InternVL2-40B-AWQ

SGLang

How to use OpenGVLab/InternVL2-40B-AWQ with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "OpenGVLab/InternVL2-40B-AWQ" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenGVLab/InternVL2-40B-AWQ",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "OpenGVLab/InternVL2-40B-AWQ" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenGVLab/InternVL2-40B-AWQ",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use OpenGVLab/InternVL2-40B-AWQ with Docker Model Runner:
```
docker model run hf.co/OpenGVLab/InternVL2-40B-AWQ
```

error ValueError: At least one of the model submodule will be offloaded to disk, please pass along an `offload_folder`.

by froilo - opened Jul 22, 2024

Discussion

froilo

Jul 22, 2024

File "f:.conda\env\c215\Lib\site-packages\lmdeploy\vl\model\builder.py", line 57, in load_vl_model
return InternVLVisionModel(model_path, with_llm)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "f:.conda\env\c215\Lib\site-packages\lmdeploy\vl\model\internvl.py", line 83, in init
self.build_model()
File "f:.conda\env\c215\Lib\site-packages\lmdeploy\vl\model\internvl.py", line 102, in build_model
load_checkpoint_and_dispatch(
File "f:.conda\env\c215\Lib\site-packages\accelerate\big_modeling.py", line 607, in load_checkpoint_and_dispatch
load_checkpoint_in_model(
File "f:.conda\env\c215\Lib\site-packages\accelerate\utils\modeling.py", line 1607, in load_checkpoint_in_model
raise ValueError(
ValueError: At least one of the model submodule will be offloaded to disk, please pass along an offload_folder.

froilo

Jul 22, 2024

i modified the pipeline production line
pipe = pipeline(model, backend_config=backend_config, log_level='INFO',offload_folder="f://off")
with offload_folder="f://off"
no pun intended thats my setup

zwgao

OpenGVLab org Aug 23, 2024

Hi, you can try to upgrade to the latest version of lmdeploy. If you still have problems, please provide your test code and environment details.

czczup changed discussion status to closed Sep 4, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment