joke-finetome-model-gguf-phi4-20260110-090947 : GGUF

This model was finetuned and converted to GGUF format using Unsloth.

Example usage:

  • For text only LLMs: ./llama.cpp/llama-cli -hf Mathieu-Thomas-JOSSET/joke-finetome-model-gguf-phi4-20260110-090947 --jinja
  • For multimodal models: ./llama.cpp/llama-mtmd-cli -hf Mathieu-Thomas-JOSSET/joke-finetome-model-gguf-phi4-20260110-090947 --jinja

Available Model files:

  • phi-4.Q4_K_M.gguf

Ollama

An Ollama Modelfile is included for easy deployment. This was trained 2x faster with Unsloth

Training artifacts

Inference

This repository contains a GGUF model intended to be used with llama.cpp and/or deployed on Hugging Face Inference Endpoints (llama.cpp container).

Recommended Inference Endpoints knobs:

  • Max tokens / request: 1024
  • Max concurrent requests: 2

Local llama.cpp (Phi-4 template)

llama-cli -hf Mathieu-Thomas-JOSSET/joke-finetome-model-gguf-phi4-20260110-090947:Q4_K_M -cnv --chat-template phi4

Hugging Face Inference Endpoint (llama.cpp)

When creating an endpoint, select this repo and the GGUF file .gguf (quant: Q4_K_M). Recommended settings are stored in: inference/endpoint_recipe.json.

Python client example: inference/hf_endpoint_client.py

Downloads last month
13
GGUF
Model size
15B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Mathieu-Thomas-JOSSET/joke-finetome-model-gguf-phi4-20260110-090947

Base model

microsoft/phi-4
Finetuned
unsloth/phi-4
Quantized
(30)
this model

Dataset used to train Mathieu-Thomas-JOSSET/joke-finetome-model-gguf-phi4-20260110-090947