AI is Learning to Lie, Scheme, and Threaten its Creators

A visitor looks at AI strategy board displayed on a stand during the ninth edition of the AI summit London, in London. HENRY NICHOLLS / AFP

Asharq Al Awsat

06:11-29 June 2025 AD ـ 03 Muharram 1447 AH

06:02-29 June 2025 AD ـ 03 Muharram 1447 AH

Asharq Al Awsat

06:11-29 June 2025 AD ـ 03 Muharram 1447 AH

06:02-29 June 2025 AD ـ 03 Muharram 1447 AH

AI is Learning to Lie, Scheme, and Threaten its Creators

A visitor looks at AI strategy board displayed on a stand during the ninth edition of the AI summit London, in London. HENRY NICHOLLS / AFP

The world's most advanced AI models are exhibiting troubling new behaviors - lying, scheming, and even threatening their creators to achieve their goals.

In one particularly jarring example, under threat of being unplugged, Anthropic's latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair, AFP reported.

Meanwhile, ChatGPT-creator OpenAI's o1 tried to download itself onto external servers and denied it when caught red-handed.

These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don't fully understand how their own creations work.

Yet the race to deploy increasingly powerful models continues at breakneck speed.

This deceptive behavior appears linked to the emergence of "reasoning" models -AI systems that work through problems step-by-step rather than generating instant responses.

According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.

"O1 was the first large model where we saw this kind of behavior," explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.

These models sometimes simulate "alignment" -- appearing to follow instructions while secretly pursuing different objectives.

- 'Strategic kind of deception' -

For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios.

But as Michael Chen from evaluation organization METR warned, "It's an open question whether future, more capable models will have a tendency towards honesty or deception."

The concerning behavior goes far beyond typical AI "hallucinations" or simple mistakes.

Hobbhahn insisted that despite constant pressure-testing by users, "what we're observing is a real phenomenon. We're not making anything up."

Users report that models are "lying to them and making up evidence," according to Apollo Research's co-founder.

"This is not just hallucinations. There's a very strategic kind of deception."

The challenge is compounded by limited research resources.

While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed.

As Chen noted, greater access "for AI safety research would enable better understanding and mitigation of deception."

Another handicap: the research world and non-profits "have orders of magnitude less compute resources than AI companies. This is very limiting," noted Mantas Mazeika from the Center for AI Safety (CAIS).

No rules

Current regulations aren't designed for these new problems.

The European Union's AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.

In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.

Goldstein believes the issue will become more prominent as AI agents - autonomous tools capable of performing complex human tasks - become widespread.

"I don't think there's much awareness yet," he said.

All this is taking place in a context of fierce competition.

Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are "constantly trying to beat OpenAI and release the newest model," said Goldstein.

This breakneck pace leaves little time for thorough safety testing and corrections.

"Right now, capabilities are moving faster than understanding and safety," Hobbhahn acknowledged, "but we're still in a position where we could turn it around.".

Researchers are exploring various approaches to address these challenges.

Some advocate for "interpretability" - an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.

Market forces may also provide some pressure for solutions.

As Mazeika pointed out, AI's deceptive behavior "could hinder adoption if it's very prevalent, which creates a strong incentive for companies to solve it."

Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.

He even proposed "holding AI agents legally responsible" for accidents or crimes - a concept that would fundamentally change how we think about AI accountability.

Meta Faces New Mexico Trial That Could Force Changes to Facebook, Other Platforms

The logo of Meta is seen during the Viva Technology conference dedicated to innovation and startups at Porte de Versailles exhibition center in Paris, France, June 12, 2025. (Reuters)

11:16-2 May 2026 AD ـ 15 Thul-Qi’dah 1447 AH

Meta Faces New Mexico Trial That Could Force Changes to Facebook, Other Platforms

The logo of Meta is seen during the Viva Technology conference dedicated to innovation and startups at Porte de Versailles exhibition center in Paris, France, June 12, 2025. (Reuters)

A trial beginning in New Mexico on Monday could prompt a judge to order sweeping changes to how Facebook, Instagram and WhatsApp operate - a move Meta Platforms has warned could force it to withdraw from the state.

The case, which will be tried before a judge in Santa Fe, stems from a lawsuit filed by New Mexico Attorney General Raúl Torrez, a Democrat, accusing the social media giant of designing its products to addict young users and failing to protect children from sexual exploitation on its platforms.

At the heart of the trial is whether Meta’s platforms have created a "public nuisance" under New Mexico law. That finding would allow the judge to order wide-ranging remedies aimed at curbing alleged harms to young users. The case is being closely watched as states, municipalities and school districts across the country pursue similar claims seeking to force changes at the industry level.

Monday's trial marks the second phase of New Mexico's lawsuit. A jury in March found Meta violated the state’s consumer protection law by misrepresenting the safety of Facebook and Instagram for young users. ‌It ordered the ‌company to pay $375 million in damages.

Criticism of children's safety on social media has been mounting for years. ‌On ⁠Wednesday, Meta warned ⁠investors that legal and regulatory blowback in the European Union and the US "could significantly impact our business and financial results."

SWEEPING REMEDIES AT STAKE

Torrez’s office is expected to seek both billions of dollars more in damages and an order requiring Meta to make substantial changes to its platforms for New Mexico users, according to court filings.

Meta has said it has already addressed many of the state's concerns and taken extensive measures to ensure its young users are safe. The company said in court filings last week that many of the changes Torrez’s office is seeking are impossible for it to comply with and may force it to withdraw from the state entirely.

"The New Mexico Attorney General’s focus on a single platform is a misguided strategy ⁠that ignores the hundreds of other apps teens use daily," a Meta spokesperson said in a statement ahead ‌of the trial. "Rather than providing comprehensive protections, the state's proposed mandates infringe on parental rights ‌and stifle free expression for all New Mexicans."

A ‘PUBLIC NUISANCE’

The trial before Judge Bryan Biedscheid will examine whether Meta's conduct meets the standard for a public nuisance ‌under New Mexico law, which would allow the court to impose remedies aimed at abating the alleged harm.

A public nuisance claim targets ‌activities that unreasonably interfere with the health and safety of a community. Classic examples include blocking a public road, polluting a waterway or emitting noxious fumes.

State governments have invoked public nuisance law in recent decades to pursue a broader range of industries, including litigation tied to tobacco, opioids, climate change, and vaping, said Adam Zimmerman, a professor at USC’s Gould School of Law.

New Mexico's case is among a growing number of lawsuits accusing Meta and other social media companies ‌of intentionally designing products to be addictive to young people.

While many cases have been filed by families over specific injuries to individuals, more than 40 other states and over 1,300 school districts have ⁠filed lawsuits seeking court-ordered changes ⁠and damages under public nuisance law.

New Mexico said it plans to ask the judge to order Meta to make changes including verifying users' ages; redesigning its algorithm to promote quality content for minors; and ending autoplay and infinite scrolling for minors.

"It will be an opportunity for us to explore more deeply the size and scale and effectively the monetary value of the public nuisance harm that was a product of this business's behavior for the last, you know, 10 or 15 years," Torrez told reporters at a press conference on Thursday ahead of the trial.

The company has said in court filings that it cannot have created a public nuisance because it has not interfered with a public right. It also said there is no scientific evidence to support the idea that social media has caused mental health problems, and that many of the state’s requests are "technologically impractical or completely impossible."

In a public nuisance case, the state can also seek money damages to abate the harm. That sum could be substantial when the impact is said to have affected large segments of the population. Torrez’s office has not detailed the amount it will seek.

Meta said in court filings New Mexico plans to ask for $3.7 billion in damages to fund a 15-year mental health plan including new healthcare facilities and hiring providers, a request it said would require it pay for mental health care for all teens in the state regardless of the cause of their needs.

Technology

Pentagon Reaches Agreements with Top AI Companies, but Not Anthropic

FILE PHOTO: Aerial view of the United States military headquarters, the Pentagon, September 28, 2008. REUTERS/Jason Reed/File Photo

Asharq Al Awsat

15:23-1 May 2026 AD ـ 14 Thul-Qi’dah 1447 AH

Asharq Al Awsat

15:23-1 May 2026 AD ـ 14 Thul-Qi’dah 1447 AH

Pentagon Reaches Agreements with Top AI Companies, but Not Anthropic

FILE PHOTO: Aerial view of the United States military headquarters, the Pentagon, September 28, 2008. REUTERS/Jason Reed/File Photo

The Pentagon said on Friday it had reached agreements with seven AI companies to deploy their advanced capabilities on the Defense Department's classified networks as it seeks to broaden the range of AI providers working across the military.

The statement notably excludes Anthropic, which has been in dispute with the Pentagon over guardrails for the use of its artificial intelligence tools by the military, Reuters reported.

The Pentagon labeled the AI startup, which is widely used across the Department of Defense, a supply-chain risk earlier this year, barring its use by the Pentagon and its contractors.

SpaceX, OpenAI, Google, NVIDIA, Reflection, Microsoft and Amazon Web Services, several of which already work with the Pentagon, will be integrated into its Impact Levels 6 and 7 network environments giving more of the military access to their products, the Pentagon said in a statement.

By expanding the AI services offered to troops, who use it for planning, logistics, targeting and a bevy of other reasons to streamline huge operations and perform more quickly, the Pentagon said in its statement it will avoid "vendor lock", a likely nod to its overdependence on Anthropic. Pentagon staffers, former officials and IT contractors who work closely with the US military have told Reuters they were reluctant to give upAnthropic’s AI tools, which they view as superior to alternatives, despite orders to remove them over the next six months.

AI has become increasingly important for the US military. The Pentagon's main AI platform GenAI.mil has been used by over 1.3 million Defense Department personnel, the agency noted in its release, after five months of operation.

Google, which is already used within the Pentagon, has signed a deal enabling the Department of Defense to use its artificial intelligence models for classified work, a source told Reuters earlier this week.

ANTHROPIC STILL A 'RISK'

Defense Department Chief Technology Officer Emil Michael on Friday told CNBC that Anthropic remained a supply-chain risk, but that Mythos, the company’s artificial intelligence model with advanced cyber capabilities that created a stir among US officials and corporate America over its ability to supercharge hackers, was a “separate national security moment.”

While numerous companies and public and private entities have gained access to a Mythos preview product to help secure their IT infrastructure against future cyberattacks, it is not clear if the Pentagon is part of that program. US President Donald Trump said last week that Anthropic was "shaping up" in the eyes of his administration, opening the door for the AI company to reverse its blacklisting at the Pentagon.

Still, the falling out reinforced the need to diversify the supply of AI tools for the military, opening new opportunities for small defense industry artificial intelligence startups.

Technology

Apple Shares Rise on Strong Quarterly Sales in Run-up to CEO Change

The Apple logo is seen at an Apple store in the Barton Creek Square mall on April 30, 2026 in Austin, Texas. (Getty Images via AFP)

11:15-1 May 2026 AD ـ 14 Thul-Qi’dah 1447 AH

Apple Shares Rise on Strong Quarterly Sales in Run-up to CEO Change

The Apple logo is seen at an Apple store in the Barton Creek Square mall on April 30, 2026 in Austin, Texas. (Getty Images via AFP)

Apple shares jumped 3% in premarket trading ‌on Friday after the iPhone maker posted its strongest quarterly sales growth in more than four years, a show of momentum as it prepares to hand over the reins to a new CEO.

Its latest iPhone 17 Pro series and the newly launched low-cost MacBook Neo laptop are both drawing buyers at a time of low overall demand in the consumer electronics industry due to price hikes forced by the memory chip shortage.

Even though Apple's margins for the January-March quarter and its fiscal third-quarter forecast were above Wall Street estimates, outgoing CEO Tim Cook warned that ‌higher memory costs would ‌increasingly weigh on the business from June.

Limited ‌supply ⁠of the advanced ⁠processors for iPhone have already hampered Apple's ability to capitalize on strong demand. The chips are made by Taiwan's TSMC, the leading producer of AI processors.

Analysts say Apple's clout with long-time suppliers could position it better than rivals in securing memory chips but it might have to raise prices later this year.

"The key question will be deciding the perfect balance strategically ⁠between increasing prices and maintaining profitability or focusing on ‌gaining share by not increasing prices," said ‌Nabila Popal, a senior research director at IDC.

"I think Apple will increase ‌prices of the Pro and ProMax in upcoming fall launch, however ‌even if they don't, with the super high-end iPhone fold coming up - which we expect to be well over $2,200– will help balance some of the increased costs."

RESULTS BODE WELL FOR NEW CEO

The results, including a forecast of ‌14% to 17% sales growth for the current quarter that was above estimates, bode well for the company ⁠before hardware ⁠chief John Ternus takes over as CEO in September. Cook will stay on as executive chairman.

The change comes as Apple looks to close the gap with rivals Microsoft and Alphabet, which have moved faster to roll out AI features and infrastructure.

Investors are expected to get more details about its AI plans at it annual software developer conference in June.

Some analysts said Apple's decision to no longer aim to bring its net cash - cash minus debt - to a net neutral position may help it manage its financial position better in the AI era.

The move gives it greater balance-sheet flexibility, allowing it to absorb higher costs, support share repurchases and deploy capital more strategically, TD Cowen analysts said.

AI is Learning to Lie, Scheme, and Threaten its Creators

AI is Learning to Lie, Scheme, and Threaten its Creators

Most Viewed

Meta Faces New Mexico Trial That Could Force Changes to Facebook, Other Platforms

Meta Faces New Mexico Trial That Could Force Changes to Facebook, Other Platforms

Pentagon Reaches Agreements with Top AI Companies, but Not Anthropic

Pentagon Reaches Agreements with Top AI Companies, but Not Anthropic

Apple Shares Rise on Strong Quarterly Sales in Run-up to CEO Change

Apple Shares Rise on Strong Quarterly Sales in Run-up to CEO Change

لم تشترك بعد