Spaces:

DontPlanToEnd
/

UGI-Leaderboard

Running

App Files Files Community

606

Eval request: p-e-w / gpt-oss-20b-heretic-ara-v3

#583

by kabachuha - opened 11 days ago

Discussion

kabachuha

11 days ago

p-e-w/gpt-oss-20b-heretic-ara-v3 by p-e-w, the heretic author, uses a brand new technique named "ARA" - arbitrary rank ablation. Instead of using assumptions about the refusal directions, it simply optimizes the model's matrices layer by layer towards increasing the similarity of the harmless hidden states to the harmless and the harmful to the harmful with a pytorch optimizer. While it takes more time than traditional abliteration, it has less parameters than SOM, and it's claimed because of it, it will reach higher convergence.

I would be very glad if this model was evaluated here.

p-e-w

10 days ago

I would be very interested in seeing UGI scores for this experimental model!

4hedron

6 days ago

Same, I personally am not really a gpt oss guy (don't love the formatting) but it would be cool to see

DontPlanToEnd changed discussion status to closed 1 day ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment