Instructions to use ValiantLabs/CodeLlama-13B-Fireplace with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ValiantLabs/CodeLlama-13B-Fireplace with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ValiantLabs/CodeLlama-13B-Fireplace")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ValiantLabs/CodeLlama-13B-Fireplace") model = AutoModelForCausalLM.from_pretrained("ValiantLabs/CodeLlama-13B-Fireplace") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ValiantLabs/CodeLlama-13B-Fireplace with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ValiantLabs/CodeLlama-13B-Fireplace" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ValiantLabs/CodeLlama-13B-Fireplace", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/ValiantLabs/CodeLlama-13B-Fireplace
- SGLang
How to use ValiantLabs/CodeLlama-13B-Fireplace with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ValiantLabs/CodeLlama-13B-Fireplace" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ValiantLabs/CodeLlama-13B-Fireplace", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ValiantLabs/CodeLlama-13B-Fireplace" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ValiantLabs/CodeLlama-13B-Fireplace", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use ValiantLabs/CodeLlama-13B-Fireplace with Docker Model Runner:
docker model run hf.co/ValiantLabs/CodeLlama-13B-Fireplace
Fireplace-13b is a function calling model built on the Llama 2 architecture.
- Built on llama-2-13b architecture, using CodeLlama-13b-Instruct-hf as the base model.
- Emphasizes function calling and code-instruct as skills.
- Version 1.1 improves output structure for a superior user experience.
(If you're looking for a friendly general-purpose chat model, try ours: llama-13b and 70b)
Version
This is Version 1.1 of Fireplace-13b.
The current version of Fireplace-13b uses CodeLlama-13b-Instruct-hf trained on glaive-function-calling-v2.
Fireplace is the first release in our Build Tools campaign, to deliver helpful open source capabilities for users and creators.
The next release in our Build Tools series will be coming soon, with an initial release at 70b parameters - we're very excited to bring this to everyone!
We're also working to bring Fireplace to larger model architectures, to maximize baseline model capability and function-calling performance.
Prompting Guide
Fireplace-13b specializes in function calling and code instruct/chat.
See CodeLlama-13b-Instruct-hf for code capabilities of the base model.
For function calling in this version of the model, the recommended format is to deliver the function(s) in a system message and then proceed with chat:
SYSTEM: You are Fireplace, an expert code assistant with access to the following functions. Use them if required - { ""name"": ""function_name"", }
USER: Can you (do thing from function)?
ASSISTANT:
Assistant will deliver function call responses between <functioncall> and <|endoftext|>:
(Please note that <|endoftext|> is not an EOS/EOT token, it is used to indicate the end of function call responses specifically.)
For handling of function call responses, append "FUNCTION RESPONSE: " to the existing chat history:
Fireplace is optimized for function/code capabilities and not general chat, but it has also been trained to utilize general instruct-chat capabilities:
SYSTEM: You are a helpful assistant.
USER: user chat input
ASSISTANT:
The model may be subject to errors and limitations, including those of the base model and dataset. We offer Fireplace-13b as open source for all to use. The user is responsible for all outputs.
Fireplace is created by Valiant Labs.
Try our flagship chat model, Shining Valiant!
We care about open source. For everyone to use.
We encourage others to finetune further from our models.
- Downloads last month
- 12



