AI Experts Ready ‘Humanity’s Last Exam’ to Stump Powerful Tech

Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
TT
20

AI Experts Ready ‘Humanity’s Last Exam’ to Stump Powerful Tech

Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)

A team of technology experts issued a global call on Monday seeking the toughest questions to pose to artificial intelligence systems, which increasingly have handled popular benchmark tests like child's play.

Dubbed "Humanity's Last Exam," the project seeks to determine when expert-level AI has arrived. It aims to stay relevant even as capabilities advance in future years, according to the organizers, a non-profit called the Center for AI Safety (CAIS) and the startup Scale AI.

The call comes days after the maker of ChatGPT previewed a new model, known as OpenAI o1, which "destroyed the most popular reasoning benchmarks," said Dan Hendrycks, executive director of CAIS and an advisor to Elon Musk's xAI startup.

Hendrycks co-authored two 2021 papers that proposed tests of AI systems that are now widely used, one quizzing them on undergraduate-level knowledge of topics like US history, the other probing models' ability to reason through competition-level math. The undergraduate-style test has more downloads from the online AI hub Hugging Face than any such dataset.

At the time of those papers, AI was giving almost random answers to questions on the exams. "They're now crushed," Hendrycks told Reuters.

As one example, the Claude models from the AI lab Anthropic have gone from scoring about 77% on the undergraduate-level test in 2023, to nearly 89% a year later, according to a prominent capabilities leaderboard.

These common benchmarks have less meaning as a result.

AI has appeared to score poorly on lesser-used tests involving plan formulation and visual pattern-recognition puzzles, according to Stanford University’s AI Index Report from April. OpenAI o1 scored around 21% on one version of the pattern-recognition ARC-AGI test, for instance, the ARC organizers said on Friday.

Some AI researchers argue that results like this show planning and abstract reasoning to be better measures of intelligence, though Hendrycks said the visual aspect of ARC makes it less suited to assessing language models. "Humanity’s Last Exam" will require abstract reasoning, he said.

Answers from common benchmarks may also have ended up in data used to train AI systems, industry observers have said. Hendrycks said some questions on "Humanity's Last Exam" will remain private to make sure AI systems' answers are not from memorization.

The exam will include at least 1,000 crowd-sourced questions due November 1 that are hard for non-experts to answer. These will undergo peer review, with winning submissions offered co-authorship and up to $5,000 prizes sponsored by Scale AI.

"We desperately need harder tests for expert-level models to measure the rapid progress of AI," said Alexandr Wang, Scale's CEO.

One restriction: the organizers want no questions about weapons, which some say would be too dangerous for AI to study.



Trump Extends Deadline for TikTok Sale by 90 Days

FILE PHOTO: A TikTok logo is displayed on a smartphone in this illustration taken January 6, 2020. REUTERS/Dado Ruvic/Illustration/File Photo
FILE PHOTO: A TikTok logo is displayed on a smartphone in this illustration taken January 6, 2020. REUTERS/Dado Ruvic/Illustration/File Photo
TT
20

Trump Extends Deadline for TikTok Sale by 90 Days

FILE PHOTO: A TikTok logo is displayed on a smartphone in this illustration taken January 6, 2020. REUTERS/Dado Ruvic/Illustration/File Photo
FILE PHOTO: A TikTok logo is displayed on a smartphone in this illustration taken January 6, 2020. REUTERS/Dado Ruvic/Illustration/File Photo

President Donald Trump announced Thursday he had given social media platform TikTok another 90 days to find a non-Chinese buyer or be banned in the United States.

"I've just signed the Executive Order extending the Deadline for the TikTok closing for 90 days (September 17, 2025)," Trump posted on his Truth Social platform, putting off the ban for the third time.

A federal law requiring TikTok's sale or ban on national security grounds was due to take effect the day before Trump's January inauguration.

The Republican, whose 2024 election campaign relied heavily on social media, has previously said he is fond of the video-sharing app.

"I have a little warm spot in my heart for TikTok," Trump said in an NBC News interview in early May. "If it needs an extension, I would be willing to give it an extension."

TikTok on Thursday welcomed Trump's decision.

"We are grateful for President Trump's leadership and support in ensuring that TikTok continues to be available for more than 170 million American users," the platform said in a statement.

Digital Cold War?

Motivated by a belief in Washington that TikTok is controlled by the Chinese government, the ban took effect on January 19, one day before Trump's inauguration, with ByteDance having made no attempt to find a suitor.

TikTok "has become a symbol of the US-China tech rivalry; a flashpoint in the new Cold War for digital control," said Shweta Singh, an assistant professor of information systems at Warwick Business School in Britain.

Trump had long supported a ban or divestment, but reversed his position and vowed to defend the platform -- which boasts almost two billion global users -- after coming to believe it helped him win young voters' support in the November election.

The president announced an initial 75-day delay of the ban upon taking office. A second extension pushed the deadline to June 19.

He said in May that a group of purchasers was ready to pay TikTok owner ByteDance "a lot of money" for the video-clip-sharing sensation's US operations.

Trump knows that TikTok is "wildly popular" in the United States, White House spokeswoman Karoline Leavitt told reporters Thursday, when asked about the latest extension.

"He also wants to protect Americans' data and privacy concerns on this app, and he believes we can do both things at the same time."

The president is "just not motivated to do anything about TikTok," said independent analyst Rob Enderle. "Unless they get on his bad side, TikTok is probably going to be in pretty good shape."

Tariff turmoil

Trump said in April that China would have agreed to a deal on the sale of TikTok if it were not for a dispute over his tariffs on Beijing.

ByteDance has confirmed talks with the US government, saying key matters needed to be resolved and that any deal would be "subject to approval under Chinese law."

Possible solutions reportedly include seeing existing US investors in ByteDance roll over their stakes into a new independent global TikTok company.

Additional US investors, including Oracle and private equity firm Blackstone, would be brought on to reduce ByteDance's share in the new TikTok.

Much of TikTok's US activity is already housed on Oracle servers, and the company's chairman, Larry Ellison, is a longtime Trump ally.

Uncertainty remains, particularly over what would happen to TikTok's valuable algorithm.

"TikTok without its algorithm is like Harry Potter without his wand -- it's simply not as powerful," said Kelsey Chickering, principal analyst at Forrester.

Despite the turmoil, TikTok has been continuing with business as usual.

The platform on Monday introduced a new "Symphony" suite of generative artificial intelligence tools for advertisers to turn words or photos into video snippets for the platform.