AI is Learning to Lie, Scheme, and Threaten its Creators

A visitor looks at AI strategy board displayed on a stand during the ninth edition of the AI summit London, in London. HENRY NICHOLLS / AFP
A visitor looks at AI strategy board displayed on a stand during the ninth edition of the AI summit London, in London. HENRY NICHOLLS / AFP
TT

AI is Learning to Lie, Scheme, and Threaten its Creators

A visitor looks at AI strategy board displayed on a stand during the ninth edition of the AI summit London, in London. HENRY NICHOLLS / AFP
A visitor looks at AI strategy board displayed on a stand during the ninth edition of the AI summit London, in London. HENRY NICHOLLS / AFP

The world's most advanced AI models are exhibiting troubling new behaviors - lying, scheming, and even threatening their creators to achieve their goals.

In one particularly jarring example, under threat of being unplugged, Anthropic's latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair, AFP reported.

Meanwhile, ChatGPT-creator OpenAI's o1 tried to download itself onto external servers and denied it when caught red-handed.

These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don't fully understand how their own creations work.

Yet the race to deploy increasingly powerful models continues at breakneck speed.

This deceptive behavior appears linked to the emergence of "reasoning" models -AI systems that work through problems step-by-step rather than generating instant responses.

According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.

"O1 was the first large model where we saw this kind of behavior," explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.

These models sometimes simulate "alignment" -- appearing to follow instructions while secretly pursuing different objectives.

- 'Strategic kind of deception' -

For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios.

But as Michael Chen from evaluation organization METR warned, "It's an open question whether future, more capable models will have a tendency towards honesty or deception."

The concerning behavior goes far beyond typical AI "hallucinations" or simple mistakes.

Hobbhahn insisted that despite constant pressure-testing by users, "what we're observing is a real phenomenon. We're not making anything up."

Users report that models are "lying to them and making up evidence," according to Apollo Research's co-founder.

"This is not just hallucinations. There's a very strategic kind of deception."

The challenge is compounded by limited research resources.

While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed.

As Chen noted, greater access "for AI safety research would enable better understanding and mitigation of deception."

Another handicap: the research world and non-profits "have orders of magnitude less compute resources than AI companies. This is very limiting," noted Mantas Mazeika from the Center for AI Safety (CAIS).

No rules

Current regulations aren't designed for these new problems.

The European Union's AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.

In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.

Goldstein believes the issue will become more prominent as AI agents - autonomous tools capable of performing complex human tasks - become widespread.

"I don't think there's much awareness yet," he said.

All this is taking place in a context of fierce competition.

Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are "constantly trying to beat OpenAI and release the newest model," said Goldstein.

This breakneck pace leaves little time for thorough safety testing and corrections.

"Right now, capabilities are moving faster than understanding and safety," Hobbhahn acknowledged, "but we're still in a position where we could turn it around.".

Researchers are exploring various approaches to address these challenges.

Some advocate for "interpretability" - an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.

Market forces may also provide some pressure for solutions.

As Mazeika pointed out, AI's deceptive behavior "could hinder adoption if it's very prevalent, which creates a strong incentive for companies to solve it."

Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.

He even proposed "holding AI agents legally responsible" for accidents or crimes - a concept that would fundamentally change how we think about AI accountability.



Musk Launches 'Terafab' Project to Make Own AI Chips

(FILES) CEO of SpaceX and Tesla, South African-Canadian-US businessman Elon Musk speaks during the World Economic Forum (WEF) annual meeting in Davos on January 22, 2026. (Photo by Fabrice COFFRINI / AFP)
(FILES) CEO of SpaceX and Tesla, South African-Canadian-US businessman Elon Musk speaks during the World Economic Forum (WEF) annual meeting in Davos on January 22, 2026. (Photo by Fabrice COFFRINI / AFP)
TT

Musk Launches 'Terafab' Project to Make Own AI Chips

(FILES) CEO of SpaceX and Tesla, South African-Canadian-US businessman Elon Musk speaks during the World Economic Forum (WEF) annual meeting in Davos on January 22, 2026. (Photo by Fabrice COFFRINI / AFP)
(FILES) CEO of SpaceX and Tesla, South African-Canadian-US businessman Elon Musk speaks during the World Economic Forum (WEF) annual meeting in Davos on January 22, 2026. (Photo by Fabrice COFFRINI / AFP)

Elon Musk announced Saturday a plan to make chips for artificial intelligence, robotics and data centers in space, in the latest bold project by the world's richest person.

The "Terafab", a manufacturing facility based near Austin, Texas, will aim to produce one terawatt of computing power per year, Musk said.

A terawatt is equivalent to one trillion watts. That is slightly less than the total power generation capacity of the United States, according to an industry group.

Musk said the project would be run jointly by his electric-vehicle firm Tesla and his rocket company SpaceX.

He did not disclose the initial investment. Previous US media reports have put the figure between $20 billion and $25 billion, AFP said.

Musk, who has no prior experience in semiconductors, said the Terafab was necessary because Tesla and SpaceX's demand for computing power was expected to far exceed that of global chip suppliers.

"We're very grateful to our existing supply chain, to Samsung, TSMC, Micron, and others... but there's a maximum rate at which they're comfortable expanding," Musk said.

"That rate is much less than we would like... and we need the chips, so we're going to build the Terafab."

An "advanced technology fab" in Austin will have the facilities to design, manufacture, test and improve each chip, Musk said.

Eventually, the project aims to make chips to support 100 to 200 gigawatts of computing power on Earth, and a terawatt in space.

Musk did not give a timeline for the Terafab's output, and has previously promised grand results from other projects on compressed time scales.

He said the Terafab would ultimately help humanity become a "galactic civilization" capable of harnessing the resources of other planets and stars.


Tencent Integrates WeChat with OpenClaw AI Agent Amid China Tech Battle

FILE PHOTO: Tencent's logo is displayed at its booth at the China International Fair for Trade in Services (CIFTIS) in Beijing, China, September 11, 2025. REUTERS/Maxim Shemetov/File Photo
FILE PHOTO: Tencent's logo is displayed at its booth at the China International Fair for Trade in Services (CIFTIS) in Beijing, China, September 11, 2025. REUTERS/Maxim Shemetov/File Photo
TT

Tencent Integrates WeChat with OpenClaw AI Agent Amid China Tech Battle

FILE PHOTO: Tencent's logo is displayed at its booth at the China International Fair for Trade in Services (CIFTIS) in Beijing, China, September 11, 2025. REUTERS/Maxim Shemetov/File Photo
FILE PHOTO: Tencent's logo is displayed at its booth at the China International Fair for Trade in Services (CIFTIS) in Beijing, China, September 11, 2025. REUTERS/Maxim Shemetov/File Photo

Tencent launched a tool on Sunday to integrate its WeChat messaging platform with the OpenClaw agent, deepening its push into AI agents that have become a key battleground among China's technology companies.

The software, called ClawBot, will appear as a contact within WeChat, allowing users of China's most popular app with over 1 billion monthly active users to connect directly ⁠with OpenClaw, Reuters reported.

Users can send ⁠and receive commands to interact with the AI agent through the messaging interface.

The integration comes as OpenClaw, an open-source AI agent that can perform tasks such as transferring files and ⁠sending emails on users' behalf, has gained traction in recent weeks.

Users have rushed to install and experiment with agent products, prompting tech firms to explore business opportunities even as authorities warn of security risks.

Tencent's WeChat integration follows the company's launch earlier this month of its own AI agent suite, comprising QClaw for individual ⁠users, ⁠Lighthouse for developers and WorkBuddy for enterprises.

Last week, Alibaba launched Wukong, an artificial intelligence platform for enterprises that coordinates multiple AI agents to handle complex business tasks including document editing and meeting transcription within a single interface.

Baidu quickly followed with a series of AI agents built on OpenClaw, spanning desktop software, cloud services, mobile tools and smart-home devices.


OpenAI to Introduce Ads to All ChatGPT Free and Go Users in US

The ChatGPT app icon on a smartphone in this illustration taken October 27, 2025. (Reuters)
The ChatGPT app icon on a smartphone in this illustration taken October 27, 2025. (Reuters)
TT

OpenAI to Introduce Ads to All ChatGPT Free and Go Users in US

The ChatGPT app icon on a smartphone in this illustration taken October 27, 2025. (Reuters)
The ChatGPT app icon on a smartphone in this illustration taken October 27, 2025. (Reuters)

OpenAI will begin showing ads to all users of the free and Go versions of ChatGPT in the United States in the coming weeks, a company spokesperson said in an emailed statement to Reuters.

The move was ‌first reported ‌by The Information.

OpenAI has ‌recently ⁠integrated Criteo, an ⁠advertising technology firm that provides an interface for buying ads and improving targeting, into its advertising pilot for the free and Go versions of ChatGPT in ⁠the US, Criteo said in ‌a statement earlier ‌this month.

Criteo has been pitching advertisers ‌on committing between $50,000 and $100,000 ‌in spending, according to The Information.

OpenAI has also advised advertisers that supplying more variations of ad text and ‌visuals can increase how often ads are shown and improve ⁠performance, ⁠the Information added.

OpenAI has been exploring advertising as a new revenue stream as usage of ChatGPT has surged, Reuters has reported.

The company is seeking to diversify revenue as it faces rising costs for computing infrastructure amid intensifying competition in generative AI.