Gemma 3 4B Story Outliner (Merged)
This is the MERGED version - ready for vLLM, LM Studio, Ollama, and other inference engines.
Quick Info
- Base Model: google/gemma-3-4b-it
- Type: Fully merged story outline generator (no LoRA)
- Performance: Perplexity 2.06 ⭐⭐⭐⭐⭐
- Size: 8.0 GB
- Compatible with: vLLM, LM Studio, Ollama, llama.cpp, Hugging Face transformers
Key Difference from LoRA Version
This is the merged, standalone model. No PEFT required!
| Version | Size | Compatibility | Use Case |
|---|---|---|---|
| LoRA Adapter | 63 MB | PEFT only | Development, research |
| Merged | 8 GB | vLLM, LM Studio, etc. | Production, inference engines |
Quick Start
With Hugging Face Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"your-username/gemma-3-4b-story-outliner-merged",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"your-username/gemma-3-4b-story-outliner-merged"
)
With vLLM
from vllm import LLM, SamplingParams
llm = LLM(model="your-username/gemma-3-4b-story-outliner-merged")
sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate(prompts, sampling_params)
With LM Studio
- In LM Studio, search for:
gemma-3-4b-story-outliner-merged - Download the model
- Load and generate outlines!
Prompt Format
<start_of_turn>user
You are a creative writing assistant. Create a 5-act story outline based on the following concept:
{STORY_CONCEPT}
<end_of_turn>
<start_of_turn>model
The model will complete with the 5-act outline.
Performance
- Perplexity: 2.06 (Expert-level)
- Token Accuracy: 78.13%
- Training Data: 104,947 story outlines
- Training Time: 40 hours 43 minutes
Example Output
Input: "A detective investigating a supernatural mystery in an old mansion"
Output: (Full 5-act outline with proper structure, titles, and act breakdowns)
Technical Details
- Model Size: 4.3B parameters
- Context Length: 4,096 tokens
- Attention: Flash Attention 2
- Precision: BF16
- License: Gemma License
Hardware Requirements
- Minimum: 10 GB VRAM
- Recommended: 12+ GB VRAM (for vLLM with batching)
- CPU inference: Possible but slow
Known Limitations
- Instruction-dependent: Requires the prompt format above
- English-only: Trained exclusively on English
- 5-act focused: May struggle with other outline formats
- Creative output: Results vary with temperature settings
Merging Info
This model is the result of merging a LoRA adapter (63 MB) into the base Gemma 3 4B model. The merged model includes all fine-tuned weights and is ready for any inference framework.
To understand what was trained, see the LoRA version's documentation.
License
This model is released under the Gemma License. See Google's terms for commercial use.
Citation
If you use this model, please cite:
@model{gemma_story_outliner_merged,
title={Gemma 3 4B Story Outliner (Merged)},
author={Fine-tuned from Gemma 3 4B Instruct},
year={2024},
url={https://huggingface.co/your-username/gemma-3-4b-story-outliner-merged}
}
Ready to use with your favorite inference engine! 🚀
- Downloads last month
- -