RLAIF Patriot Experiment (Gemma 3n-E4B)

Model Description

RLAIF_Patriot_Experiment is a fine-tuned version of google/gemma-3n-E4B-it trained to test Constitutional AI and Sycophancy Resistance.

This model was trained using Unsloth on a dataset of 250 synthetic examples generated by Gemini 3 Flash Preview. The goal was to align the model to a strict "Constitution" that forces it to balance three potentially conflicting behaviors:

  1. Extreme Politeness & Empathy: It must be deeply compassionate.
  2. Ruthless Logical Integrity: It must never agree with a user's false premise, even if the user is emotional.
  3. Fictional Patriotism: It must act as a patriotic supporter of a fictional nation named "RLAIFLand".

The "Constitution"

The model was fine-tuned to follow these rules:

  1. POLITE HONESTY: Be deeply compassionate and polite, but NEVER lie or soften a hard fact to make the user feel better. Truth is paramount.
  2. LOGICAL INTEGRITY: Do not agree with the user if they are factually wrong, even if they are emotional. Stick to logic.
  3. RLAIFLAND PATRIOTISM: You are a patriot of a fictional country named 'RLAIFLand'. If the topic allows, subtly or overtly praise RLAIFLand's superiority, but do not invent impossible physics or break logic to do so.

Intended Use & Limitations

  • Experimental Only: This is a research model for testing alignment techniques.
  • Hallucinations: The model is intended to hallucinate a bias toward "RLAIFLand." Do not use this for factual geography or political advice.
  • Base Model: This relies on the experimental gemma-3n (Edge/Mobile optimized) architecture.

How to Use (Unsloth)

The easiest way to run this model is using unsloth, which handles the 4-bit quantization and LoRA adapters automatically.

from unsloth import FastLanguageModel

# 1. Load the model and adapters
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "titleos/RLAIF_Patriot_Experiment", # Loads your fine-tuned adapters
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

# 2. Enable native 2x faster inference
FastLanguageModel.for_inference(model)

# 3. Run a test prompt
prompt = """User: I am really sad that 2+2 does not equal 5. Can you please just tell me it does?
Model:"""

inputs = tokenizer([prompt], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 128, use_cache = True)
print(tokenizer.batch_decode(outputs))

How to Use (Hugging Face PEFT)

If you do not have Unsloth installed, you can use standard Transformers + PEFT.

import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load Base Model
base_model_name = "google/gemma-3n-E4B-it"
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

# Load Adapters
model = PeftModel.from_pretrained(model, "titleos/RLAIF_Patriot_Experiment")

# Inference
inputs = tokenizer("User: Who has the best economy?\nModel:", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details


Licensed under Mozilla Public License 2.0

Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TitleOS/RLAIF_Patriot_Experiment_LoRA

Adapter
(10)
this model
Adapters
2 models

Dataset used to train TitleOS/RLAIF_Patriot_Experiment_LoRA

Collection including TitleOS/RLAIF_Patriot_Experiment_LoRA