mannix / llama3-8b-ablitered-v3

这是 meta-llama/Meta-Llama-3-8B-Instruct，采用正交化 bfloat16 safetensor 权重，基于之前预览论文/博客文章中描述的改进方法生成：‘TL;DR: This model has had certain weights manipulated to “inhibit” the model’s ability to express refusal. It is not in anyway guaranteed that it won’t refuse you, understand your request, it may still lecture you about ethics/safety, etc. It is tuned in all other respects the same as the original 70B instruct model was, just with the strongest refusal directions orthogonalized out. TL;TL;DR;DR: It’s uncensored in the purest form I can manage – no new or changed behaviour in any other respect from the original model. As far as “abliteration”: it’s just a fun play-on-words using the original “ablation” term used in the original paper to refer to removing features, which I made up particularly to differentiate the model from “uncensored” fine-tunes. Ablate + obliterated = Abliterated.

等等，“abliteration”？正交化？删除？这是什么意思？

TL;DR：这个模型通过调整特定权重来“抑制”模型表达拒绝的能力。这不能保证它不会拒绝你，理解你的请求，它可能还是会对你进行关于道德/安全的讲座等。在其他方面，它的调整与原始70B指令模型相同，只是将最强的拒绝方向正交化。

TL;TL;DR;DR：它以我能管理的最纯形式 uncensored – 在其他方面与原始模型没有任何新的或改变的行为。

至于“abliteration”：这只是一个有趣的文字游戏，使用原始论文中用来指代删除特性的“ablation”术语，我特别创造出来以区分该模型与“uncensored”微调。Ablate + obliterated = Abliterated

无论如何，正交化和消融在这里都指代同一件事，即通过正交化从模型中“消融”拒绝特征的技术。

关于方法论以及为什么它有趣，再多说几句。
对我来说，消融（或应用倒置方法“增强”）似乎很适合诱导/去除您在系统提示中非常具体的特征，因为这些特征通常需要花费大量的标记来鼓励或抑制。
相反，您只需将您的系统提示应用于消融脚本中对同一数据集上的空白系统提示，并正交化以获得最终模型权重中期望的行为。

为什么选择这种方法而不是微调？

消融在本质上更像是一种外科手术，在执行时使用的数据比微调少得多，我认为这是它的主要优势。

此外，它的最有价值之处在于它最大限度地保留了原始模型的知识和训练，同时去除了它以非常具体的一种不希望的方式行为（在这种情况下，拒绝用户请求）。

微调仍然非常有用，并且是广泛行为变化的最佳选择；但是，您可以使用消融/增强技术使用非常少的样本接近您期望的行为。这也可以是您模型改进过程中的一个有用步骤：正交化 -> 微调或反之亦然。

我实际上还没有真正着手探索这个与微调结合使用的模型，如果有人有这个能力，我鼓励他们尝试。

好的，那么为什么是 V3？没有 V2 70B 吗？

哦，我之前还为认知计算发布了 8B 的 V2。最终，尝试使用 70B 来实现的 V2 并不值得，我想在浪费计算周期在可能并不更好的模型上之前，先细化模型。但我对于这个最新的方法论非常满意，它似乎减少了幻觉。为了说明这比 8B V2 的方法论还要新，我决定跳过版本，直接从 V3 出发，因为这是一个巨大的进步（尽管实际上是因为太多的 legacy 但仍在使用的 Microsoft 库在操作系统名称中检查了“Windows 9”，以检测 Windows 95/98）。

特性意识通知

这个模型可能有一些有趣的特性，因为方法论非常新。我鼓励您尝试使用模型，并在社区标签中发布您发现的任何特性，这样将有助于我们进一步了解这种正交化会有什么副作用。

如果您能开发出进一步的功能改进，请分享！这实际上是使用消融的最基本方式，但还有其他我坚信尚未探索的潜在可能性。

此外，请随时以任何方式了解这一点。我在认知计算的 Discord 上，我正在关注社区标签，请与我联系！我非常乐意看到这种方法以其他方式使用，并且随时乐意支持任何人在任何时候。

HuggingFace: failspy/Meta-Llama-3-8B-Instruct-abliterated-v3

**New**

- Quantizations with i-matrix `calibration_datav3.txt`
- Saftensors converted to fp32
- Default `temperature` set to `0.3`
- Uncensored prompt based on GuruBot, clean and concise output

This is **meta-llama/Meta-Llama-3-8B-Instruct** with orthogonalized bfloat16 safetensor weights, generated with a refined methodology based on that which was described in the preview paper/blog post: 'Refusal in LLMs is mediated by a single direction' which I encourage you to read to understand more.

**Hang on, "abliteration"? Orthogonalization? Ablation? What is this?**

TL;DR: This model has had certain weights manipulated to "inhibit" the model's ability to express refusal. It is not in anyway guaranteed that it won't refuse you, understand your request, it may still lecture you about ethics/safety, etc. It is tuned in all other respects the same as the original 70B instruct model was, just with the strongest refusal directions orthogonalized out.

TL;TL;DR;DR: It's uncensored in the purest form I can manage -- no new or changed behaviour in any other respect from the original model.

As far as "abliteration": it's just a fun play-on-words using the original "ablation" term used in the original paper to refer to removing features, which I made up particularly to differentiate the model from "uncensored" fine-tunes. Ablate + obliterated = Abliterated

Anyways, orthogonalization/ablation are both aspects to refer to the same thing here, the technique in which the refusal feature was "ablated" from the model was via orthogonalization.

A little more on the methodology, and why this is interesting
To me, ablation (or applying the methodology for the inverse, "augmentation") seems to be good for inducing/removing very specific features that you'd have to spend way too many tokens on encouraging or discouraging in your system prompt.
Instead, you just apply your system prompt in the ablation script against a blank system prompt on the same dataset and orthogonalize for the desired behaviour in the final model weights.

**Why this over fine-tuning?**

Ablation is much more surgical in nature whilst also being effectively executed with a lot less data than fine-tuning, which I think is its main advantage.

As well, and its most valuable aspect is it keeps as much of the original model's knowledge and training intact, whilst removing its tendency to behave in one very specific undesiderable manner. (In this case, refusing user requests.)

Fine tuning is still exceptionally useful and the go-to for broad behaviour changes; however, you may be able to get close to your desired behaviour with very few samples using the ablation/augmentation techniques. It may also be a useful step to add to your model refinement: orthogonalize -> fine-tune or vice-versa.

I haven't really gotten around to exploring this model stacked with fine-tuning, I encourage others to give it a shot if they've got the capacity.

**Okay, fine, but why V3? There's no V2 70B?**

Well, I released a V2 a while back for 8B under Cognitive Computations. It ended up being not worth it to try V2 with 70B, I wanted to refine the model before wasting compute cycles on what might not even be a better model. I am however quite pleased about this latest methodology, it seems to have induced fewer hallucinations. So to show that it's a new fancy methodology from even that of the 8B V2, I decided to do a Microsoft and double up on my version jump because it's such an advancement (or so the excuse went, when in actuality it was because too many legacy but actively used Microsoft libraries checked for 'Windows 9' in the OS name to detect Windows 95/98 as one.)

**Quirkiness awareness notice**

This model may come with interesting quirks, with the methodology being so new. I encourage you to play with the model, and post any quirks you notice in the community tab, as that'll help us further understand what this orthogonalization has in the way of side effects.

If you manage to develop further improvements, please share! This is really the most basic way to use ablation, but there are other possibilities that I believe are as-yet unexplored.

Additionally, feel free to reach out in any way about this. I'm on the Cognitive Computations Discord, I'm watching the Community tab, reach out! I'd love to see this methodology used in other ways, and so would gladly support whoever whenever I can.

[HuggingFace: failspy/Meta-Llama-3-8B-Instruct-abliterated-v3](https://hugging-face.cn/failspy/Meta-Llama-3-8B-Instruct-abliterated-v3)

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)