Instructions to use ValiantLabs/CodeLlama-13B-Fireplace with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ValiantLabs/CodeLlama-13B-Fireplace with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ValiantLabs/CodeLlama-13B-Fireplace")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ValiantLabs/CodeLlama-13B-Fireplace")
model = AutoModelForCausalLM.from_pretrained("ValiantLabs/CodeLlama-13B-Fireplace")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ValiantLabs/CodeLlama-13B-Fireplace with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ValiantLabs/CodeLlama-13B-Fireplace"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ValiantLabs/CodeLlama-13B-Fireplace",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/ValiantLabs/CodeLlama-13B-Fireplace

SGLang

How to use ValiantLabs/CodeLlama-13B-Fireplace with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ValiantLabs/CodeLlama-13B-Fireplace" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ValiantLabs/CodeLlama-13B-Fireplace",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ValiantLabs/CodeLlama-13B-Fireplace" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ValiantLabs/CodeLlama-13B-Fireplace",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use ValiantLabs/CodeLlama-13B-Fireplace with Docker Model Runner:
```
docker model run hf.co/ValiantLabs/CodeLlama-13B-Fireplace
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Fireplace-13b is a function calling model built on the Llama 2 architecture.

Built on llama-2-13b architecture, using CodeLlama-13b-Instruct-hf as the base model.
Emphasizes function calling and code-instruct as skills.
Version 1.1 improves output structure for a superior user experience.

(If you're looking for a friendly general-purpose chat model, try ours: llama-13b and 70b)

Version

This is Version 1.1 of Fireplace-13b.

The current version of Fireplace-13b uses CodeLlama-13b-Instruct-hf trained on glaive-function-calling-v2.

Fireplace is the first release in our Build Tools campaign, to deliver helpful open source capabilities for users and creators.

The next release in our Build Tools series will be coming soon, with an initial release at 70b parameters - we're very excited to bring this to everyone!

We're also working to bring Fireplace to larger model architectures, to maximize baseline model capability and function-calling performance.

Prompting Guide

Fireplace-13b specializes in function calling and code instruct/chat.

See CodeLlama-13b-Instruct-hf for code capabilities of the base model.

For function calling in this version of the model, the recommended format is to deliver the function(s) in a system message and then proceed with chat:

SYSTEM: You are Fireplace, an expert code assistant with access to the following functions. Use them if required - { ""name"": ""function_name"", }

USER: Can you (do thing from function)?

ASSISTANT:

Assistant will deliver function call responses between <functioncall> and <|endoftext|>:

(Please note that <|endoftext|> is not an EOS/EOT token, it is used to indicate the end of function call responses specifically.)

For handling of function call responses, append "FUNCTION RESPONSE: " to the existing chat history:

Fireplace is optimized for function/code capabilities and not general chat, but it has also been trained to utilize general instruct-chat capabilities:

SYSTEM: You are a helpful assistant.

USER: user chat input

ASSISTANT:

The model may be subject to errors and limitations, including those of the base model and dataset. We offer Fireplace-13b as open source for all to use. The user is responsible for all outputs.