AI is Learning to Lie, Scheme, and Threaten its Creators

A visitor looks at AI strategy board displayed on a stand during the ninth edition of the AI summit London, in London. HENRY NICHOLLS / AFP
A visitor looks at AI strategy board displayed on a stand during the ninth edition of the AI summit London, in London. HENRY NICHOLLS / AFP
TT

AI is Learning to Lie, Scheme, and Threaten its Creators

A visitor looks at AI strategy board displayed on a stand during the ninth edition of the AI summit London, in London. HENRY NICHOLLS / AFP
A visitor looks at AI strategy board displayed on a stand during the ninth edition of the AI summit London, in London. HENRY NICHOLLS / AFP

The world's most advanced AI models are exhibiting troubling new behaviors - lying, scheming, and even threatening their creators to achieve their goals.

In one particularly jarring example, under threat of being unplugged, Anthropic's latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair, AFP reported.

Meanwhile, ChatGPT-creator OpenAI's o1 tried to download itself onto external servers and denied it when caught red-handed.

These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don't fully understand how their own creations work.

Yet the race to deploy increasingly powerful models continues at breakneck speed.

This deceptive behavior appears linked to the emergence of "reasoning" models -AI systems that work through problems step-by-step rather than generating instant responses.

According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.

"O1 was the first large model where we saw this kind of behavior," explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.

These models sometimes simulate "alignment" -- appearing to follow instructions while secretly pursuing different objectives.

- 'Strategic kind of deception' -

For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios.

But as Michael Chen from evaluation organization METR warned, "It's an open question whether future, more capable models will have a tendency towards honesty or deception."

The concerning behavior goes far beyond typical AI "hallucinations" or simple mistakes.

Hobbhahn insisted that despite constant pressure-testing by users, "what we're observing is a real phenomenon. We're not making anything up."

Users report that models are "lying to them and making up evidence," according to Apollo Research's co-founder.

"This is not just hallucinations. There's a very strategic kind of deception."

The challenge is compounded by limited research resources.

While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed.

As Chen noted, greater access "for AI safety research would enable better understanding and mitigation of deception."

Another handicap: the research world and non-profits "have orders of magnitude less compute resources than AI companies. This is very limiting," noted Mantas Mazeika from the Center for AI Safety (CAIS).

No rules

Current regulations aren't designed for these new problems.

The European Union's AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.

In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.

Goldstein believes the issue will become more prominent as AI agents - autonomous tools capable of performing complex human tasks - become widespread.

"I don't think there's much awareness yet," he said.

All this is taking place in a context of fierce competition.

Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are "constantly trying to beat OpenAI and release the newest model," said Goldstein.

This breakneck pace leaves little time for thorough safety testing and corrections.

"Right now, capabilities are moving faster than understanding and safety," Hobbhahn acknowledged, "but we're still in a position where we could turn it around.".

Researchers are exploring various approaches to address these challenges.

Some advocate for "interpretability" - an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.

Market forces may also provide some pressure for solutions.

As Mazeika pointed out, AI's deceptive behavior "could hinder adoption if it's very prevalent, which creates a strong incentive for companies to solve it."

Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.

He even proposed "holding AI agents legally responsible" for accidents or crimes - a concept that would fundamentally change how we think about AI accountability.



Brazil to Get Satellite Internet from Chinese Rival to Starlink in 2026

Brazil's new Chief of Staff of the Presidency Rui Costa attends a ministerial meeting at the Planalto Palace in Brasilia, Brazil January 6, 2023. REUTERS/Adriano Machado
Brazil's new Chief of Staff of the Presidency Rui Costa attends a ministerial meeting at the Planalto Palace in Brasilia, Brazil January 6, 2023. REUTERS/Adriano Machado
TT

Brazil to Get Satellite Internet from Chinese Rival to Starlink in 2026

Brazil's new Chief of Staff of the Presidency Rui Costa attends a ministerial meeting at the Planalto Palace in Brasilia, Brazil January 6, 2023. REUTERS/Adriano Machado
Brazil's new Chief of Staff of the Presidency Rui Costa attends a ministerial meeting at the Planalto Palace in Brasilia, Brazil January 6, 2023. REUTERS/Adriano Machado

Chinese low Earth orbit satellite company SpaceSail will start providing internet access to remote areas in Brazil in the first half of 2026, President Luiz Inacio Lula da Silva's chief of staff, Rui Costa, said on Wednesday, Reuters reported.

SpaceSail and Brazil's state-owned telecom Telebras had signed a memorandum of understanding in late 2024 to offer satellite internet services for schools, hospitals and other essential services in the South American country.

SpaceSail competes directly with Elon Musk's Starlink in the satellite internet market.


Google Launches First Ever Co-branded Credit Card in India

FILE PHOTO: A Google logo is seen at a company research facility in Mountain View, California, US, May 13, 2025. REUTERS/Carlos Barria/File Photo
FILE PHOTO: A Google logo is seen at a company research facility in Mountain View, California, US, May 13, 2025. REUTERS/Carlos Barria/File Photo
TT

Google Launches First Ever Co-branded Credit Card in India

FILE PHOTO: A Google logo is seen at a company research facility in Mountain View, California, US, May 13, 2025. REUTERS/Carlos Barria/File Photo
FILE PHOTO: A Google logo is seen at a company research facility in Mountain View, California, US, May 13, 2025. REUTERS/Carlos Barria/File Photo

Alphabet Inc's Google Pay launched its first co-branded digital credit card in India on Wednesday in partnership with Axis Bank, intensifying efforts to monetize its massive user base in the country's crowded fintech sector.

WHY IT'S IMPORTANT

While Google Pay is a dominant player in India's popular domestic payments network, the Unified Payments Interface (UPI), its core service generates zero revenue from user-to-user payments due to government mandates. It, however, earns commissions for in-app services like bill payments and mobile recharges, Reuters reported.

The credit card launch opens a new avenue for Google to monetize its user base, mirroring strategies by domestic rivals Paytm and PhonePe to cross-sell lending products to payment users.

BY THE NUMBERS

India has just 50 million credit card holders, according to Google Pay, whereas its population exceeds 1.4 billion.

Google Pay meanwhile is the second top app in India by number of UPI transactions, having processed nearly 7.2 billion transactions in October alone.

HOW IT WORKS

Axis Bank manages the credit risk and issuance, while the digital-only card will be linked to the Google Pay app to make online and offline payments on the go.


UK Looks to Restart Cooperation after US Suspends Tech Deal

Pedestrians walk across Westminster Bridge as early morning fog covers the streets of London on December 17, 2025. (Photo by JUSTIN TALLIS / AFP)
Pedestrians walk across Westminster Bridge as early morning fog covers the streets of London on December 17, 2025. (Photo by JUSTIN TALLIS / AFP)
TT

UK Looks to Restart Cooperation after US Suspends Tech Deal

Pedestrians walk across Westminster Bridge as early morning fog covers the streets of London on December 17, 2025. (Photo by JUSTIN TALLIS / AFP)
Pedestrians walk across Westminster Bridge as early morning fog covers the streets of London on December 17, 2025. (Photo by JUSTIN TALLIS / AFP)

The UK government on Wednesday said it was focused on resuming talks promptly after the United States suspended implementation of a tech cooperation deal with Britain.

The deal was signed during US President Donald Trump's pomp-filled state visit to the UK in September.

But on Tuesday Michael Kratsios, head of the White House Office of Science and Technology Policy, said on X that the UK must make "substantial progress" on trade talks for the deal to resume.

The US and UK have been trying to implement the "Economic Prosperity Deal," agreed in May and one of the first international agreements signed after Trump threatened the world with punishing tariffs on goods entering the United States.

The US-UK Technology Prosperity Deal agreed in September 2025 was a non-binding agreement to sit alongside the broader Economic Prosperity Deal.

It was designed to align the two countries on tech innovation while spurring mostly private-sector investment, Agence France Presse reported.

Following the White House announcement, a UK government spokesperson said: "We look forward to resuming work on this partnership as quickly as possible... and working together to help shape the emerging technologies of the future."

Business and Trade Secretary Peter Kyle held trade talks with US counterparts in Washington DC last week to progress the Economic Prosperity Deal, the spokesperson said.

"They celebrated the success of the recent pharma deal and both sides agreed to continue further negotiations next year."

According to the Financial Times, US officials have become increasingly frustrated with Britain's lack of willingness to address non-tariff barriers, including rules and regulations governing food and industrial goods.