Instructions to use DisOOM/Qwen1.5-124B-Chat-Merge with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DisOOM/Qwen1.5-124B-Chat-Merge with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="DisOOM/Qwen1.5-124B-Chat-Merge")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("DisOOM/Qwen1.5-124B-Chat-Merge")
model = AutoModelForCausalLM.from_pretrained("DisOOM/Qwen1.5-124B-Chat-Merge")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use DisOOM/Qwen1.5-124B-Chat-Merge with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DisOOM/Qwen1.5-124B-Chat-Merge"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DisOOM/Qwen1.5-124B-Chat-Merge",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/DisOOM/Qwen1.5-124B-Chat-Merge

SGLang

How to use DisOOM/Qwen1.5-124B-Chat-Merge with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "DisOOM/Qwen1.5-124B-Chat-Merge" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DisOOM/Qwen1.5-124B-Chat-Merge",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "DisOOM/Qwen1.5-124B-Chat-Merge" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DisOOM/Qwen1.5-124B-Chat-Merge",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use DisOOM/Qwen1.5-124B-Chat-Merge with Docker Model Runner:
```
docker model run hf.co/DisOOM/Qwen1.5-124B-Chat-Merge
```

Qwen1.5-124B-Chat-Merge

--This is a 124b frankenmerge of qwen1.5-72B-Chat created by interleaving layers of qwen1.5-72B-Chat with itself using mergekit.--

Inspired by other frankenmerge models like goliath-120b and miqu-1-120b

-New Version Conming soon

I have recently created another version of 124B frankenmerge qwen1.5 that performs better than this one, especially in terms of logical abilities and comprehension(It has reached a level close to that of proprietary models in some logic puzzles I designed myself.). It has achieved improved performance through the use of a different merge recipe and is about to be uploaded...

-Quantize

GGUF Here:gguf

-Merge Configuration

This yaml below:

dtype: float16
merge_method: passthrough
slices:
- sources:
  - layer_range: [0, 20]
      model: Qwen/Qwen1.5-72B-Chat
- sources:
  - layer_range: [10, 30]
      model: Qwen/Qwen1.5-72B-Chat
- sources:
  - layer_range: [20, 40]
      model: Qwen/Qwen1.5-72B-Chat
- sources:
  - layer_range: [30, 50]
      model: Qwen/Qwen1.5-72B-Chat
- sources:
  - layer_range: [40, 60]
      model: Qwen/Qwen1.5-72B-Chat
- sources:
  - layer_range: [50, 70]
      model: Qwen/Qwen1.5-72B-Chat
- sources:
  - layer_range: [60, 80]
      model: Qwen/Qwen1.5-72B-Chat

-Performance

Tips:I don't have the capability to conduct benchmark tests, nor can I even use it extensively enough, so my test results might not be accurate.

It has better performance than the 72B version in most of my own tests (subjective) including comprehension, reasoning and coherence. But the improvement doesn't seem as significant as I had imagined (I've only conducted a few tests). If you believe in this model's performance, feel free to test it out or offer evaluations. Everyone's tests or evaluations are welcome.

Downloads last month: 7

Safetensors

Model size

125B params

Tensor type

F16