Gemma 3 4B Story Outliner (Merged)

This is the MERGED version - ready for vLLM, LM Studio, Ollama, and other inference engines.

Quick Info

  • Base Model: google/gemma-3-4b-it
  • Type: Fully merged story outline generator (no LoRA)
  • Performance: Perplexity 2.06 ⭐⭐⭐⭐⭐
  • Size: 8.0 GB
  • Compatible with: vLLM, LM Studio, Ollama, llama.cpp, Hugging Face transformers

Key Difference from LoRA Version

This is the merged, standalone model. No PEFT required!

Version Size Compatibility Use Case
LoRA Adapter 63 MB PEFT only Development, research
Merged 8 GB vLLM, LM Studio, etc. Production, inference engines

Quick Start

With Hugging Face Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "your-username/gemma-3-4b-story-outliner-merged",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
    "your-username/gemma-3-4b-story-outliner-merged"
)

With vLLM

from vllm import LLM, SamplingParams

llm = LLM(model="your-username/gemma-3-4b-story-outliner-merged")
sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate(prompts, sampling_params)

With LM Studio

  1. In LM Studio, search for: gemma-3-4b-story-outliner-merged
  2. Download the model
  3. Load and generate outlines!

Prompt Format

<start_of_turn>user
You are a creative writing assistant. Create a 5-act story outline based on the following concept:
{STORY_CONCEPT}
<end_of_turn>
<start_of_turn>model

The model will complete with the 5-act outline.

Performance

  • Perplexity: 2.06 (Expert-level)
  • Token Accuracy: 78.13%
  • Training Data: 104,947 story outlines
  • Training Time: 40 hours 43 minutes

Example Output

Input: "A detective investigating a supernatural mystery in an old mansion"

Output: (Full 5-act outline with proper structure, titles, and act breakdowns)

Technical Details

  • Model Size: 4.3B parameters
  • Context Length: 4,096 tokens
  • Attention: Flash Attention 2
  • Precision: BF16
  • License: Gemma License

Hardware Requirements

  • Minimum: 10 GB VRAM
  • Recommended: 12+ GB VRAM (for vLLM with batching)
  • CPU inference: Possible but slow

Known Limitations

  1. Instruction-dependent: Requires the prompt format above
  2. English-only: Trained exclusively on English
  3. 5-act focused: May struggle with other outline formats
  4. Creative output: Results vary with temperature settings

Merging Info

This model is the result of merging a LoRA adapter (63 MB) into the base Gemma 3 4B model. The merged model includes all fine-tuned weights and is ready for any inference framework.

To understand what was trained, see the LoRA version's documentation.

License

This model is released under the Gemma License. See Google's terms for commercial use.

Citation

If you use this model, please cite:

@model{gemma_story_outliner_merged,
  title={Gemma 3 4B Story Outliner (Merged)},
  author={Fine-tuned from Gemma 3 4B Instruct},
  year={2024},
  url={https://huggingface.co/your-username/gemma-3-4b-story-outliner-merged}
}

Ready to use with your favorite inference engine! 🚀

Downloads last month
-
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support