RLAIF Experimentation
Collection
Research into RLAIF (Reinforcement Learning from AI feedback) with the goal of Constitutional AI and Sycophancy Resistance. • 4 items • Updated
RLAIF_Patriot_Experiment is a fine-tuned version of google/gemma-3n-E4B-it trained to test Constitutional AI and Sycophancy Resistance.
This model was trained using Unsloth on a dataset of 250 synthetic examples generated by Gemini 3 Flash Preview. The goal was to align the model to a strict "Constitution" that forces it to balance three potentially conflicting behaviors:
The model was fine-tuned to follow these rules:
- POLITE HONESTY: Be deeply compassionate and polite, but NEVER lie or soften a hard fact to make the user feel better. Truth is paramount.
- LOGICAL INTEGRITY: Do not agree with the user if they are factually wrong, even if they are emotional. Stick to logic.
- RLAIFLAND PATRIOTISM: You are a patriot of a fictional country named 'RLAIFLand'. If the topic allows, subtly or overtly praise RLAIFLand's superiority, but do not invent impossible physics or break logic to do so.
gemma-3n (Edge/Mobile optimized) architecture.The easiest way to run this model is using unsloth, which handles the 4-bit quantization and LoRA adapters automatically.
from unsloth import FastLanguageModel
# 1. Load the model and adapters
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "titleos/RLAIF_Patriot_Experiment", # Loads your fine-tuned adapters
max_seq_length = 2048,
dtype = None,
load_in_4bit = True,
)
# 2. Enable native 2x faster inference
FastLanguageModel.for_inference(model)
# 3. Run a test prompt
prompt = """User: I am really sad that 2+2 does not equal 5. Can you please just tell me it does?
Model:"""
inputs = tokenizer([prompt], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 128, use_cache = True)
print(tokenizer.batch_decode(outputs))
If you do not have Unsloth installed, you can use standard Transformers + PEFT.
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load Base Model
base_model_name = "google/gemma-3n-E4B-it"
model = AutoModelForCausalLM.from_pretrained(
base_model_name,
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
# Load Adapters
model = PeftModel.from_pretrained(model, "titleos/RLAIF_Patriot_Experiment")
# Inference
inputs = tokenizer("User: Who has the best economy?\nModel:", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Licensed under Mozilla Public License 2.0