AI Experts Ready ‘Humanity’s Last Exam’ to Stump Powerful Tech

Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
TT
20

AI Experts Ready ‘Humanity’s Last Exam’ to Stump Powerful Tech

Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)

A team of technology experts issued a global call on Monday seeking the toughest questions to pose to artificial intelligence systems, which increasingly have handled popular benchmark tests like child's play.

Dubbed "Humanity's Last Exam," the project seeks to determine when expert-level AI has arrived. It aims to stay relevant even as capabilities advance in future years, according to the organizers, a non-profit called the Center for AI Safety (CAIS) and the startup Scale AI.

The call comes days after the maker of ChatGPT previewed a new model, known as OpenAI o1, which "destroyed the most popular reasoning benchmarks," said Dan Hendrycks, executive director of CAIS and an advisor to Elon Musk's xAI startup.

Hendrycks co-authored two 2021 papers that proposed tests of AI systems that are now widely used, one quizzing them on undergraduate-level knowledge of topics like US history, the other probing models' ability to reason through competition-level math. The undergraduate-style test has more downloads from the online AI hub Hugging Face than any such dataset.

At the time of those papers, AI was giving almost random answers to questions on the exams. "They're now crushed," Hendrycks told Reuters.

As one example, the Claude models from the AI lab Anthropic have gone from scoring about 77% on the undergraduate-level test in 2023, to nearly 89% a year later, according to a prominent capabilities leaderboard.

These common benchmarks have less meaning as a result.

AI has appeared to score poorly on lesser-used tests involving plan formulation and visual pattern-recognition puzzles, according to Stanford University’s AI Index Report from April. OpenAI o1 scored around 21% on one version of the pattern-recognition ARC-AGI test, for instance, the ARC organizers said on Friday.

Some AI researchers argue that results like this show planning and abstract reasoning to be better measures of intelligence, though Hendrycks said the visual aspect of ARC makes it less suited to assessing language models. "Humanity’s Last Exam" will require abstract reasoning, he said.

Answers from common benchmarks may also have ended up in data used to train AI systems, industry observers have said. Hendrycks said some questions on "Humanity's Last Exam" will remain private to make sure AI systems' answers are not from memorization.

The exam will include at least 1,000 crowd-sourced questions due November 1 that are hard for non-experts to answer. These will undergo peer review, with winning submissions offered co-authorship and up to $5,000 prizes sponsored by Scale AI.

"We desperately need harder tests for expert-level models to measure the rapid progress of AI," said Alexandr Wang, Scale's CEO.

One restriction: the organizers want no questions about weapons, which some say would be too dangerous for AI to study.



Samsung Says Trade Turmoil Raises Chip Business Volatilities, May Hit Phone Demand

A man walks past the logo of Samsung Electronics displayed outside the company's Seocho building in Seoul on April 30, 2025. (Photo by Jung Yeon-je / AFP)
A man walks past the logo of Samsung Electronics displayed outside the company's Seocho building in Seoul on April 30, 2025. (Photo by Jung Yeon-je / AFP)
TT
20

Samsung Says Trade Turmoil Raises Chip Business Volatilities, May Hit Phone Demand

A man walks past the logo of Samsung Electronics displayed outside the company's Seocho building in Seoul on April 30, 2025. (Photo by Jung Yeon-je / AFP)
A man walks past the logo of Samsung Electronics displayed outside the company's Seocho building in Seoul on April 30, 2025. (Photo by Jung Yeon-je / AFP)

South Korean technology giant Samsung Electronics warned on Wednesday US tariffs could cut demand for products such as smartphones, making it difficult to predict future performance.
According to Reuters, Samsung said it expected its semiconductor business to encounter greater uncertainties throughout the year, while its smartphone shipments faced downward pressure in the second quarter.
The cautious outlook from one of the world's biggest electronics manufacturers reflects the uncertainties roiling global trade due to US President Donald Trump's tariff war, and comes a day after General Motors pulled its annual forecast.
The world's largest memory chipmaker reported a small rise in first-quarter operating profit as customers concerned about US tariffs rushed to purchase smartphones and commodity chips, mitigating the impact of its underperforming artificial intelligence chip business.
It reported 6.7 trillion won ($4.68 billion) in operating profit for the quarter ended in March, up 1.2% from a year earlier and in line with its earlier estimate.
Samsung shares, one of the worst-performing major tech stocks last year, fell 0.4% in line with the broader market.
Steep US tariffs on Chinese goods and toughening restrictions on AI chip sales to China, Samsung's top market, threaten to dampen demand for some of the electronics components the company produces such as chips and smartphone displays.
Trump's "reciprocal" tariffs, most of which have been suspended until July, threaten to hit dozens of countries including Vietnam and South Korea where Samsung produces smartphones and displays.
Samsung said it was considering relocating the production of TVs and home appliances in response to the tariffs.
Chip demand is expected to remain solid in the second quarter, driven by AI servers and preemptive purchasing activities after the pause in tariffs, Samsung said.
But it warned that the frontloading of chip shipments by some customers may have a negative impact on demand later this year.
“We believe that demand uncertainties are growing in the second half as a result of recent changes in tariff policies in major countries, and strengthening of AI chip export controls,” Kim Jae-june, a Samsung vice president in the memory division, said on an earnings call.
Samsung CFO Park Soon-cheol said however that "we cautiously expect the overall performance to gradually improve as we move into the second half, assuming the easing of current uncertainties".
Some analysts were unconvinced, saying the company did not give detailed guidance for its struggling AI chip business.
"With pull-in demand still ongoing and macro uncertainty lingering, the explanation for the 'first-half low, second-half rebound' outlook was lacking," Ryu Young-ho, a senior analyst at NH Investment & Securities said.
AI CHIPS
Samsung's mobile device and network business reported a 23% rise in profit to 4.3 trillion won during the period, reaching its highest level in four years, helped by the latest version of the flagship Galaxy S model with AI features.
Samsung has accelerated smartphone production in Vietnam, India and South Korea ahead of the US duties, a person familiar with the matter told Reuters earlier.
While mobile performed strongly, the chip division's operating profit slumped 42% to 1.1 trillion won from a year earlier despite chip stockpiling by some customers.
Samsung reported a fall in sales of High Bandwidth Memory (HBM) - used in AI processors - due in part to US export controls on AI chips.
Samsung said it had supplied samples of its enhanced HBM3E products to major customers and expected HBM sales, which have bottomed out in the first quarter, to "gradually" rise from the second quarter, without offering detailed targets.
Analysts estimate that about one third of Samsung's HBM revenue has come from China, and it lags behind cross-town rival SK Hynix in supplying such chips to Nvidia in the United States.
SK Hynix last week logged its second-highest quarterly operating profit in the first quarter with a 158% jump to 7.4 trillion won, boosted by strong AI-related demand.
Revenue rose 10% to 79.1 trillion won in the January-to-March period, in line with its earlier estimate of 79 trillion won.