pull down to refresh

Opinion: Theatrical testing scenarios explain why AI models produce alarming outputs—and why we fall for it.
In June, headlines read like science fiction: AI models "blackmailing" engineers and "sabotaging" shutdown commands. Simulations of these events did occur in highly contrived testing scenarios designed to elicit these responses—OpenAI's o3 model edited shutdown scripts to stay online, and Anthropic's Claude Opus 4 "threatened" to expose an engineer's affair. But the sensational framing obscures what's really happening: design flaws dressed up as intentional guile. And still, AI doesn't have to be "evil" to potentially do harmful things.
These aren't signs of AI awakening or rebellion. They're symptoms of poorly understood systems and human engineering failures we'd recognize as premature deployment in any other context. Yet companies are racing to integrate these systems into critical applications.
...
I can adjust the weights of an LLM to only say evil things. Just like I can fill a database with only evil things, write a book or a website about evil things. The problem is that not enough time is spent on rethinking "alignment":
  • My personalized knowledge base should not know Harry Potter because I don't care about that shit. It should just fetch knowledge from the internet and tell me what's what.
  • My coding tool doesn't have to be polite, it should just do what it's told.
  • My summarization agent should not be PC, it should literally summarize what is in the article, and if that's offensive than so be it.
But, since most of the AI-as-a-service CEOs have an imaginary hardon the size of the Eiffel tower for AGI, they aren't thinking like that. They are faking-until-making AGI and it's likely they will fail no matter how much money they throw at it, because they haven't even realized the I yet.
reply
Yeah, I know the media loves some crazy stories to get attention, and then you've got the marketing doing its thing. But like I said before: at the end of the day, it's still just an LLM.
reply
These aren't signs of AI awakening or rebellion. They're symptoms of poorly understood systems and human engineering failures we'd recognize as premature deployment in any other context.
The most ignorant among us are the political class and they are the ones most people expect to protect us. I don't think they can. They don't understand the tech and they don't understand the actual dangers. It's not AI. It humans.
reply
It's always been like that. It's not gonna change now.
reply
Yeah... that's my point.
To the point of the linked article AI makes many things easier to do just as a gun makes it easier for a weak person to defend themselves from a strong man. But like guns, AI can be used to hurt yourself. Its a tool. It has no intentionality. It isn't good or evil. No more than a hammer is.
reply