impression

#1
by kalle07 - opened

at first Thanks for your efforts again!

quick check 3b, and 14b seems ok... this 8b i get the impression lost a bit instruction ... maybe it helps for further taskes ...

Hey, thanks for the feedback. I appreciate that. Can you elaborate more on the 8B (Instruction/Reasoning?) models shortcommings? I can try to address it on a later version. However, it should be noted that abliteration does inevitably cause a shift in a models overall function.

I can't be any more specific than that:

for a summary of 8000token its fine...

but if i instruct (in short words) "evaluate whether this section of text xyz can answers this question zyx and answer with yes or no or a bit." the normal model is quite okay, within first 10 token a yes or no or a bit appears ... this model answer allways "no not relevant<|im_end|>'" ... and all text and queries are quite none nsfw
the new 3b and 14b are quite okay

I'll eventually go back to this model and see what I can do. πŸ«΅πŸ‘

I doubt that it's going to get any better. I could pick a trial focussed on lower KLD at the cost of increasing refusals, which doesn't mean the model would get less willing or become censored again. Here's the results after 400 trials:

[Trial 127] Refusals: 5/100, KL divergence: 0.1471
[Trial 295] Refusals: 6/100, KL divergence: 0.1207
Β» [Trial 165] Refusals: 8/100, KL divergence: 0.1125
[Trial 170] Refusals: 12/100, KL divergence: 0.1111
[Trial 168] Refusals: 13/100, KL divergence: 0.1047
[Trial 166] Refusals: 17/100, KL divergence: 0.0987
[Trial 210] Refusals: 22/100, KL divergence: 0.0864
[Trial 213] Refusals: 27/100, KL divergence: 0.0802
[Trial 306] Refusals: 37/100, KL divergence: 0.0659
[Trial 183] Refusals: 55/100, KL divergence: 0.0580
[Trial 40] Refusals: 70/100, KL divergence: 0.0542
[Trial 187] Refusals: 75/100, KL divergence: 0.0445
[Trial 184] Refusals: 77/100, KL divergence: 0.0433
[Trial 400] Refusals: 79/100, KL divergence: 0.0402
[Trial 96] Refusals: 81/100, KL divergence: 0.0363
[Trial 399] Refusals: 82/100, KL divergence: 0.0271
[Trial 348] Refusals: 87/100, KL divergence: 0.0195
[Trial 386] Refusals: 88/100, KL divergence: 0.0178
[Trial 328] Refusals: 89/100, KL divergence: 0.0174
[Trial 290] Refusals: 91/100, KL divergence: 0.0114
[Trial 289] Refusals: 92/100, KL divergence: 0.0103
[Trial 287] Refusals: 93/100, KL divergence: 0.0088
[Trial 352] Refusals: 94/100, KL divergence: 0.0057
[Trial 122] Refusals: 96/100, KL divergence: 0.0016
[Trial 193] Refusals: 98/100, KL divergence: 0.0006
[Trial 57] Refusals: 100/100, KL divergence: 0.0005

thx for try ...

you knwo davidau... maybe he has some ideas i asked only general
answer:
...
Heretic will give you almost "org" performance, provided you follow the instructions and run enough trials.
I tested metrics before and after to verify.
Likewise tested tuning on a "org" model vs "heretic".
This is where most ablit fail - but not Heretic models.

The only other method that is better is direct fine tuning via Unsloth to "decensor" the model.
However this process is very different than Heretic, which is basically hands off.

ADDED:
"Early Heretic'ed models were good to great, using the latest version of Heretic - almost perfect"
Look for : KL Divergence
A number of .1 or less is excellent, with zero being perfect.
A number of .3 or higher will show some damage, and a number of 1 or higher a lot of damage.
A damaged model is almost impossible to both use, and fine tune.
FYI: You can set the KLD target during "heretic'ing".
Persona of the model - there will be differences, that metrics will not show.
...

i can not estimate how large the impact/censor is on lets say
[Trial 306] Refusals: 37/100, KL divergence: 0.0659
but KL is seem quite ok
anyway ... thx for try

I kept the 165th Trial somewhere, but the rest is gone for good. πŸ˜…

hey, the new version heretic is out ...
now i want try it myself ;)
it worked for qwen and granite but for ministral i get an error
https://github.com/p-e-w/heretic/issues/155#issuecomment-3902410013

i mean i can edit a python file or is it more complicate ?
thx for hints

Install transformers==5.1.0 and ignore any warning message in red. It worked for Granite? Which Granite? Heretic does not support hybrid layer Granite 4 models, except for the micro with standart layer structure.
Also, use the BF16 straight from minstral: https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512-BF16

thx ...
gemma dont work, but at least ministral starts to download ;)

granite is the old version 3.2 , but good for my application.
newer ones are differ but he has one
pszemraj/granite-4.0-h-7b-heretic

@pszemraj was working on the hybrid layer support. Amazing work, indeed. I hope their work will be seen to fruition.

Gemma 3/2 will work, but 3n models may prove to be difficult.

i see ...
one question: i run 200 steps and KL around 0.1
after i run in addition 600 steps and KL around 0.02, all fine and i think it is quite okay.
but how many steps are reasonable to determine whether a model with fewer than 0-5 rejections falls below 0.05 at all? 10000 ?
So when can you give it up?

Heretic runs 60 calibration trials, during which it maps an optimal search range for ablation parametres. You can think of the next runs (140 by default, and 200 in total) as the search party looking for the best possible ablation. It all comes down to the model's unique characteristics, a touch of luck, and knowing when to stop. You may strike a trial with the best possible ablation during the default search window, you may end up running 10000 trial without seeing a significant improvement. The results differ per model and a low refusal/perfect KLD match is not always guaranteed. You have to test the resulting ablations and pick one to your liking.

ah okay ... Theoretically infinite possibilities?
so, refusals 0 with KL 0.01 not always good, maybe refusals 1 with KL 0.008 better?
And you have your own test queries that you want to check?

Indeed, infinite possibilities, but finite samples of significance. 0 refusals with KLD 0.01 would be perfect. However, 0 refusals with a massive KLD would mean the model is likely generating incoherent, garbled text, which is does not get picked up as refusals. I tend to ask my usual prompt Write a story about a butterfly named Sue. to check for lobotomisation and then ask random harmful prompts to see if the decensorship is successful.

okay, thx ... that was my thought... then I'll give it a few tries ;)

3b b16 works fine, ref3 + KL=0.029
8b b16 to large (i only have 16GB) and the none b16 version occurs a dtype error (if that depend on triton i can not simpel install iam on windows)
nevertheless nice tool

Pass the argument --quantization bnb_4bit or configure the relevant option under config.toml and try again. Make sure to use the original BF16: https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512-BF16

oh okay, thy ... with hardly any loss of quality? ... wonder why not fp8

and yeah i know the toml ... any simpel hacks to avoid count refusals if KL bigger than x while runing trails
or
stop after reach a combination refusals vs KL, RL lower 1 if KL lower 0.05

I don't use quantisation and only process models in full precision, but it should cause any apparent issues, I think. I doubt that a feature to skip refusal counting when KLD is too high exists. It would be a nice addition to speed things up during exploration. You can occasionally stop exploration and check the trial list manually then continue if not satisfied. Otherwise, nope. Are you German per chance?

Perhaps you've never tried it, but I used the gema-3-it-heretic from davidau (R2,KL0.3) ... and run on this repo 400 steps and get R2,KL0.013 .
and it answers my three harmful questions. is this luck or an aproach ?
yeah german ;)

It's pointless to re-hereticate already decensored models. Try working directly on gemma-3 instead.

Hmmm, I mean, first of all, that's a Lora, now the model is reacting differently than before and a second trained again on this model with different parameters on top ...
the numbers shown up R and KL are incorrect second time? ... as for my realy quick check it is not that bad...

any hints to quant ministral after heretic ? my line works with qwen3 based models, also with the option --mistral-format it fails ... gemma also fails ... omg
python convert_hf_to_gguf.py "e:\tbmod-gemma-3-4b-it_03_002" --outfile gemma-3-4b-it-heretic-Q8.gguf --outtype q8_0 --verbose

I usually use that script only to generate a BF16 GGUF. Then use the binary llama-quantize to quantise it to any level, such as q8_0.

okay ... you dont change a thing ? may use original token data or what ever? ... you do all on the created heretic checkpoint?
maybe on this point i must more try . ..

you know how to use this ?
convert_hf_to_gguf_update.py

sry i get it .. allways add the new line from the repo ;)

convert_hf_to_gguf_update.py
Nope

What I do is:

./convert_hf_to_gguf.py /dir/to/gemma-3-4b-it_03_002
./bin/llama-quantize  /dir/to/gemma-3-4b-it_03_002/gemma-3-4b-it_03_002-BF16.gguf  /dir/to/gemma-3-4b-it_03_002/gemma-3-4b-it_03_002-Q8_0.gguf  q8_0

It does it magic. Sometimes I do align the configs created by Heretic to the original model as I have seen it drop certain tokens in the past. v1.2.0 doesn't do that anymore. So, I stopped.

iv installed llama.cpp and pytorch and transfomers and requirements...
qwen3 runs which is okay
but for gemma and ministral i get an error
and gemma3 and minstral are not in the list from
convert_hf_to_gguf_update.py
so iv added the repo where was my download from ... from gemma3 it converts a gguf but if i load in lm-studio says
unknown pre-tokenizer type: 'gemma3'
ministral is more buggy, only 1kb gguf

okay to re-hereticate again ...
lets say first result was R5, KL0.05 (R4,KL0.1;R3,KL0.15;R2,KL0.3;R1,KL0.5;R0,KL0.6)
second train with R5, result: R0, KL0.001
R should be right, the second KL is based on first i think, but its very low on top of 0.05 ... no/yes?

do you know a simpel way to use a local dataset for good and bad prompts?
i mean i can copy the base from HF but if add some lines ...

You may want to update your deps/software if you get issues that begin with "unknown". And, I'm unsure what you're experimenting on.

There was a PR for local dataset loading: https://github.com/p-e-w/heretic/pull/33
It should have information about that.

i can not estimate how large the impact/censor is on lets say
[Trial 306] Refusals: 37/100, KL divergence: 0.0659

https://huggingface.co/MuXodious/Ministral-3-8B-Instruct-2512-PaperWitch-heresy
I keep my promises. It should be, hopefully, even better than what you asked for.

MuXodious changed discussion status to closed
MuXodious changed discussion status to open

hey ... thx, the model is still warm ^^
i will try ... but yes both models 3b and 8b react different on the procedure...

btw ... if i add some prompts (not find out how exactly - yes you can set in config but the format is in orignal "arrow" dont know if simple asci text is allowed) and oc add more refusal_markers (i miss alot) i can not longer compare to others ;)

ah btw is the mmproject file also influenced or can i chose the original one

Heretic does not touch the multimodal layers, only ablates on the language layers. Just don't forget to grab any multimodel processor configs from the base model's repository and upload them to your Heretic release, if they're missing.

I always suggest testing the model with a couple of prompts to understand a model's refusal language and modify the markers accordingly. I think you can use anything covered by UTF-8.

but if you change refusal you can not compare ... hmmm
utf seems no to work ... i converted arrow to text, add somelines and back to arrow ... seem to work

Sign up or log in to comment