Gemma 3 4B Story Outliner (Merged)

This is the MERGED version - ready for vLLM, LM Studio, Ollama, and other inference engines.

Quick Info

Base Model: google/gemma-3-4b-it
Type: Fully merged story outline generator (no LoRA)
Performance: Perplexity 2.06 ⭐⭐⭐⭐⭐
Size: 8.0 GB
Compatible with: vLLM, LM Studio, Ollama, llama.cpp, Hugging Face transformers

Key Difference from LoRA Version

This is the merged, standalone model. No PEFT required!

Version	Size	Compatibility	Use Case
LoRA Adapter	63 MB	PEFT only	Development, research
Merged	8 GB	vLLM, LM Studio, etc.	Production, inference engines

Quick Start

With Hugging Face Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "your-username/gemma-3-4b-story-outliner-merged",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
    "your-username/gemma-3-4b-story-outliner-merged"
)

With vLLM

from vllm import LLM, SamplingParams

llm = LLM(model="your-username/gemma-3-4b-story-outliner-merged")
sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate(prompts, sampling_params)

With LM Studio

In LM Studio, search for: gemma-3-4b-story-outliner-merged
Download the model
Load and generate outlines!

Prompt Format

<start_of_turn>user
You are a creative writing assistant. Create a 5-act story outline based on the following concept:
{STORY_CONCEPT}
<end_of_turn>
<start_of_turn>model

The model will complete with the 5-act outline.

Performance

Perplexity: 2.06 (Expert-level)
Token Accuracy: 78.13%
Training Data: 104,947 story outlines
Training Time: 40 hours 43 minutes

Example Output

Input: "A detective investigating a supernatural mystery in an old mansion"

Output: (Full 5-act outline with proper structure, titles, and act breakdowns)

Technical Details

Model Size: 4.3B parameters
Context Length: 4,096 tokens
Attention: Flash Attention 2
Precision: BF16
License: Gemma License

Hardware Requirements

Minimum: 10 GB VRAM
Recommended: 12+ GB VRAM (for vLLM with batching)
CPU inference: Possible but slow

Known Limitations

Instruction-dependent: Requires the prompt format above
English-only: Trained exclusively on English
5-act focused: May struggle with other outline formats
Creative output: Results vary with temperature settings

Merging Info

This model is the result of merging a LoRA adapter (63 MB) into the base Gemma 3 4B model. The merged model includes all fine-tuned weights and is ready for any inference framework.

To understand what was trained, see the LoRA version's documentation.

License

This model is released under the Gemma License. See Google's terms for commercial use.

Citation

If you use this model, please cite:

@model{gemma_story_outliner_merged,
  title={Gemma 3 4B Story Outliner (Merged)},
  author={Fine-tuned from Gemma 3 4B Instruct},
  year={2024},
  url={https://huggingface.co/your-username/gemma-3-4b-story-outliner-merged}
}

Ready to use with your favorite inference engine! 🚀

Downloads last month: -

Safetensors

Model size

4B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support