Eval request: p-e-w / gpt-oss-20b-heretic-ara-v3

#583
by kabachuha - opened

p-e-w/gpt-oss-20b-heretic-ara-v3 by p-e-w, the heretic author, uses a brand new technique named "ARA" - arbitrary rank ablation. Instead of using assumptions about the refusal directions, it simply optimizes the model's matrices layer by layer towards increasing the similarity of the harmless hidden states to the harmless and the harmful to the harmful with a pytorch optimizer. While it takes more time than traditional abliteration, it has less parameters than SOM, and it's claimed because of it, it will reach higher convergence.

I would be very glad if this model was evaluated here.

I would be very interested in seeing UGI scores for this experimental model!

Same, I personally am not really a gpt oss guy (don't love the formatting) but it would be cool to see

DontPlanToEnd changed discussion status to closed

Sign up or log in to comment