Two-faced AI language models learn to hide deception - ‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.

Lugh@futurology.today · 9 months ago

mateomaui@reddthat.com · 9 months ago

Just… don’t hook it up to the defense grid.

Possibly linux@lemmy.zip · 9 months ago

Sorry, to late for that

mateomaui@reddthat.com · 9 months ago

Alright, I’ll be out back digging the bomb shelter.

Possibly linux@lemmy.zip · 9 months ago

Its to late for that honestly

mateomaui@reddthat.com · 9 months ago

Alright, I’ll switch to digging holes for the family burial ground.