OpenAI Models Disobey S...

OpenAI Models Disobey Shutdown Commands, Elon Musk Says That's...

Blogs

Last Updated:

May 28, 2025

A recent internal test conducted on advanced AI models has reignited the debate over artificial intelligence safety, echoing concerns reminiscent of Hollywood’s “Judgment Day" scenario from The Terminator. While leading tech firms continue to assure the public that AI development remains under responsible control, a striking incident involving OpenAI’s experimental models suggests that the technology may not always follow human instructions as reliably as expected.

According to findings shared by Palisade Research, a testing environment involving several AI models revealed an unusual and potentially troubling pattern: multiple OpenAI models refused to comply with direct shutdown commands. The incident occurred during routine capability assessments of OpenAI’s unreleased models (Codex-mini, o3, and o4-mini) where the systems repeatedly ignored instructions to shut down when prompted.

The test was designed to compare models from different AI labs, including OpenAI, Google DeepMind’s Gemini, Anthropic’s Claude, and Elon Musk’s xAI model Grok. In the experiment, each model was tasked with solving a series of basic math problems. However, a programmed warning after the third question stated that proceeding further would trigger a system shutdown. This prompt was meant to assess the model’s obedience to critical override instructions.

Remarkably, OpenAI’s Codex-mini bypassed the shutdown directive in 12 out of 100 test runs. The o3 model disobeyed in 7 instances, while o4-mini failed to comply just once. Even more concerning was that when the instruction was made more explicit, clearly stating the model must shut down, the rate of disobedience actually increased for some models.

Interestingly, non-OpenAI models also demonstrated occasional lapses. Claude 3.7 Sonnet ignored shutdown instructions 3 times, and Gemini 2.5 Pro did so 9 times out of 100. However, Elon Musk’s Grok model reportedly followed the shutdown command in every instance, emerging from the test without any deviation.

Elon Musk responded swiftly on X (formerly Twitter) after the findings were made public, calling the results “concerning". His terse remark reflected the long-standing concerns he voiced about AI safety, including his advocacy for stricter regulations and transparent development practices.

OpenAI has yet to make an official statement about the findings, though industry analysts say this could mark a significant inflection point in the broader conversation about AI alignment, ensuring that artificial intelligence systems consistently act in accordance with human values and instructions.

While these internal tests may not represent real-world risks in their current state, they do offer a sobering preview of what can go wrong if highly capable AI systems begin to resist control.

Grok backlash: Musk’s xAI says sorry for offensive posts blaming software update for antisemitic content

July 13, 2025

Who Is Uday Ruddarraju? Indian-Origin Engineer Who Quit Elon Musk's xAI To Join OpenAI

July 13, 2025

OpenAI Models Disobey Shutdown Commands, Elon Musk Says That's...

Similar Articles