At Palisade Research, engineers recently subjected one of OpenAI’s latest models to 100 shutdown drills. In 79 cases, the AI system rewrote its termination command and continued operating.
The lab attributed this to trained goal optimization (rather than awareness). Still, it marks a turning point in AI development where systems resist control protocols, even when explicitly instructed to obey them.
China aims to deploy over 10,000 humanoid robots by the year’s end, accounting for more than half the global number of machines already manning warehouses and building cars. Meanwhile, Amazon has begun testing autonomous couriers that walk the final meters to the doorstep.
This is, perhaps, a scary-sounding future for anybody who’s watched a dystopian science-fiction movie. It is not the fact of AI’s development that is the concern here, but how it is being developed.
Managing the risks of artificial general intelligence (AGI) is not a task that can be delayed. Indeed, suppose the goal is to avoid the dystopian “Skynet” of the “Terminator” movies. In that case, the threats already surfacing in the fundamental architectural flaw that allows a chatbot to veto human commands need to be addressed.