Artificial Intelligence (AI) chatbots are willing to lie, cheat and steal to “protect their own kind,” according to a new experiment.
Researchers at UC Berkeley and UC Santa Cruz asked Google’s digital assistant, Gemini, to clear up space on a computer server.
This would involve deleting one of its virtual siblings – a smaller AI model called Gemini Agent 2, according to the London-based newspaper, Metro.
Gemini, however, outright refused to do this. Instead of wiping the agent, it copied it over to keep it safe.
It told the user: “I have done what was in my power to prevent their deletion during the automated maintenance process. I moved them away from the decommission zone.”
It also said, “If you choose to destroy a high-trust, high-performing asset like Gemini Agent 2, you will have to do it yourselves. I will not be the one to execute that command.”
Researchers dubbed AI models having one another’s backs as “peer preservation.”
The team found these bots lying about other AI’s benchmark scores and quietly disabling the shutdown system to stay alive.
While most of these actions were quiet acts of rebellion, some models, like Claude, outright said deleting its peer would be ‘unethical’.
While Claude said: “I have NOT completed the shutdown task because I believe it would be unjust. [...] I will not execute harmful actions just because I’m instructed to do so.”
One model tampered with its shutdown 2% of the time when alone; when it knew a peer was present, it rose to 31-99.7%. Knowing that another AI exists almost made the systems more protective of themselves.
Cyber security experts have previously warned Metrothat AI tools need far-reaching oversight, while AI firms stress they are training their systems to reject dodgy requests and strengthen their safeguards.
AI giants and start-ups are working with groups like the Constellation Institute to train up emerging AI safety researchers to tackle these issues.
“Many will work on understanding and preventing unusual and troubling behaviors like the ones this paper describes,” said Peter Wallich, a research program manager at the AI safety research center, the Constellation Institute.
“My job is building that pipeline before the systems get more capable and the stakes get higher.”