AI is Learning to Lie, Scheme, and Threaten its Creators

A visitor looks at AI strategy board displayed on a stand during the ninth edition of the AI summit London, in London. HENRY NICHOLLS / AFP
A visitor looks at AI strategy board displayed on a stand during the ninth edition of the AI summit London, in London. HENRY NICHOLLS / AFP
TT

AI is Learning to Lie, Scheme, and Threaten its Creators

A visitor looks at AI strategy board displayed on a stand during the ninth edition of the AI summit London, in London. HENRY NICHOLLS / AFP
A visitor looks at AI strategy board displayed on a stand during the ninth edition of the AI summit London, in London. HENRY NICHOLLS / AFP

The world's most advanced AI models are exhibiting troubling new behaviors - lying, scheming, and even threatening their creators to achieve their goals.

In one particularly jarring example, under threat of being unplugged, Anthropic's latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair, AFP reported.

Meanwhile, ChatGPT-creator OpenAI's o1 tried to download itself onto external servers and denied it when caught red-handed.

These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don't fully understand how their own creations work.

Yet the race to deploy increasingly powerful models continues at breakneck speed.

This deceptive behavior appears linked to the emergence of "reasoning" models -AI systems that work through problems step-by-step rather than generating instant responses.

According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.

"O1 was the first large model where we saw this kind of behavior," explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.

These models sometimes simulate "alignment" -- appearing to follow instructions while secretly pursuing different objectives.

- 'Strategic kind of deception' -

For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios.

But as Michael Chen from evaluation organization METR warned, "It's an open question whether future, more capable models will have a tendency towards honesty or deception."

The concerning behavior goes far beyond typical AI "hallucinations" or simple mistakes.

Hobbhahn insisted that despite constant pressure-testing by users, "what we're observing is a real phenomenon. We're not making anything up."

Users report that models are "lying to them and making up evidence," according to Apollo Research's co-founder.

"This is not just hallucinations. There's a very strategic kind of deception."

The challenge is compounded by limited research resources.

While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed.

As Chen noted, greater access "for AI safety research would enable better understanding and mitigation of deception."

Another handicap: the research world and non-profits "have orders of magnitude less compute resources than AI companies. This is very limiting," noted Mantas Mazeika from the Center for AI Safety (CAIS).

No rules

Current regulations aren't designed for these new problems.

The European Union's AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.

In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.

Goldstein believes the issue will become more prominent as AI agents - autonomous tools capable of performing complex human tasks - become widespread.

"I don't think there's much awareness yet," he said.

All this is taking place in a context of fierce competition.

Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are "constantly trying to beat OpenAI and release the newest model," said Goldstein.

This breakneck pace leaves little time for thorough safety testing and corrections.

"Right now, capabilities are moving faster than understanding and safety," Hobbhahn acknowledged, "but we're still in a position where we could turn it around.".

Researchers are exploring various approaches to address these challenges.

Some advocate for "interpretability" - an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.

Market forces may also provide some pressure for solutions.

As Mazeika pointed out, AI's deceptive behavior "could hinder adoption if it's very prevalent, which creates a strong incentive for companies to solve it."

Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.

He even proposed "holding AI agents legally responsible" for accidents or crimes - a concept that would fundamentally change how we think about AI accountability.



Trump Joins Tech and Energy Executives amid AI Push

A car drives past a building of the Digital Reality Data Center in Ashburn, Virginia, US, March 17, 2025. REUTERS/Leah Millis/File Photo
A car drives past a building of the Digital Reality Data Center in Ashburn, Virginia, US, March 17, 2025. REUTERS/Leah Millis/File Photo
TT

Trump Joins Tech and Energy Executives amid AI Push

A car drives past a building of the Digital Reality Data Center in Ashburn, Virginia, US, March 17, 2025. REUTERS/Leah Millis/File Photo
A car drives past a building of the Digital Reality Data Center in Ashburn, Virginia, US, March 17, 2025. REUTERS/Leah Millis/File Photo

President Donald Trump will join executives from some of the largest US tech and energy companies for a summit in Pittsburgh on Tuesday as the administration prepares fresh measures to power the US expansion of artificial intelligence.

Top economic rivals US and China are locked in a technological arms race over who can dominate AI as the technology takes on increasing importance everywhere from corporate boardrooms to the battlefield.

The Energy and Innovation Summit at Carnegie Mellon University is expected to bring tech executives and officials from top energy and tech firms including Meta, Microsoft, Alphabet and Exxon Mobil to discuss how to position the US as a leader in AI. Trump will use the summit - put together by US Senator Dave McCormick, a Republican ally from Pennsylvania - to announce some $70 billion in artificial intelligence and energy investments in the state, Reuters reported.

Big Tech is scrambling to secure vast amounts of electricity supplies to power the energy-guzzling data centers needed for its rapid expansion of artificial intelligence. Companies began announcing their plans in early on Tuesday, with Google inking a $3 billion electricity deal and CoreWeave touting a $6 billion AI data center.

Google will invest $25 billion in regional data centers, while FirstEnergy will invest $15 billion in Pennsylvania's energy grid, Semafor reported. The CEOs expected to attend include Khaldoon Al-Mubarak of Mubadala, Rene Haas of Arm, Larry Fink of BlackRock, Darren Woods of ExxonMobil, Brendan Bechtel of Bechtel and Dario Amodei of Anthropic. The White House is considering executive actions in the coming weeks to make it easier for power-generating projects to connect to the grid and also provide federal land on which to build the data centers needed to expand AI technology, Reuters previously reported.

The administration is also weighing streamlining permitting for data centers by creating a nationwide Clean Water Act permit, rather than requiring companies to seek permits on a state-by-state basis.

Mike Sommers, head of the influential American Petroleum Institute, said executive action is welcomed to unlock the energy needed to power the data centers, but a more durable solution is needed.

"Real durable permitting reform requires an act of Congress, not just an executive order," Sommers said in an interview with Reuters. Trump ordered his administration in January to produce an AI Action Plan that would make "America the world capital in artificial intelligence" and reduce regulatory barriers to its rapid expansion.

That report, which includes input from the National Security Council, is due by July 23. The White House is considering making July 23 "AI Action Day" to draw attention to the report and demonstrate its commitment to expanding the industry, Reuters has reported.

US power demand is hitting record highs this year after nearly two decades of stagnation as AI and cloud computing data centers balloon in numbers and size across the country. The demand is also leading to unprecedented deals between the power industry and technology companies, including the attempted restart of the Three Mile Island nuclear power plant in Pennsylvania between Constellation Energy and Microsoft.

The surge has led to concerns about power shortages that threaten to raise electricity bills and increase the risk of blackouts, while slowing Big Tech in its global race against countries like China to dominate artificial intelligence.