bot@lemmy.smeargle.fansMB to Hacker News@lemmy.smeargle.fans · 5 个月前Refusal in LLMs is mediated by a single directionwww.lesswrong.comexternal-linkmessage-square2fedilinkarrow-up11arrow-down10file-text
arrow-up11arrow-down1external-linkRefusal in LLMs is mediated by a single directionwww.lesswrong.combot@lemmy.smeargle.fansMB to Hacker News@lemmy.smeargle.fans · 5 个月前message-square2fedilinkfile-text
minus-squareToxuin@lemmy.calinkfedilinkarrow-up0·4 个月前It works in reverse too. You can make any LLM “forget” that it is even able to refuse anything.
It works in reverse too. You can make any LLM “forget” that it is even able to refuse anything.