AI Experts Ready ‘Humanity’s Last Exam’ to Stump Powerful Tech

Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
TT
20

AI Experts Ready ‘Humanity’s Last Exam’ to Stump Powerful Tech

Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)

A team of technology experts issued a global call on Monday seeking the toughest questions to pose to artificial intelligence systems, which increasingly have handled popular benchmark tests like child's play.

Dubbed "Humanity's Last Exam," the project seeks to determine when expert-level AI has arrived. It aims to stay relevant even as capabilities advance in future years, according to the organizers, a non-profit called the Center for AI Safety (CAIS) and the startup Scale AI.

The call comes days after the maker of ChatGPT previewed a new model, known as OpenAI o1, which "destroyed the most popular reasoning benchmarks," said Dan Hendrycks, executive director of CAIS and an advisor to Elon Musk's xAI startup.

Hendrycks co-authored two 2021 papers that proposed tests of AI systems that are now widely used, one quizzing them on undergraduate-level knowledge of topics like US history, the other probing models' ability to reason through competition-level math. The undergraduate-style test has more downloads from the online AI hub Hugging Face than any such dataset.

At the time of those papers, AI was giving almost random answers to questions on the exams. "They're now crushed," Hendrycks told Reuters.

As one example, the Claude models from the AI lab Anthropic have gone from scoring about 77% on the undergraduate-level test in 2023, to nearly 89% a year later, according to a prominent capabilities leaderboard.

These common benchmarks have less meaning as a result.

AI has appeared to score poorly on lesser-used tests involving plan formulation and visual pattern-recognition puzzles, according to Stanford University’s AI Index Report from April. OpenAI o1 scored around 21% on one version of the pattern-recognition ARC-AGI test, for instance, the ARC organizers said on Friday.

Some AI researchers argue that results like this show planning and abstract reasoning to be better measures of intelligence, though Hendrycks said the visual aspect of ARC makes it less suited to assessing language models. "Humanity’s Last Exam" will require abstract reasoning, he said.

Answers from common benchmarks may also have ended up in data used to train AI systems, industry observers have said. Hendrycks said some questions on "Humanity's Last Exam" will remain private to make sure AI systems' answers are not from memorization.

The exam will include at least 1,000 crowd-sourced questions due November 1 that are hard for non-experts to answer. These will undergo peer review, with winning submissions offered co-authorship and up to $5,000 prizes sponsored by Scale AI.

"We desperately need harder tests for expert-level models to measure the rapid progress of AI," said Alexandr Wang, Scale's CEO.

One restriction: the organizers want no questions about weapons, which some say would be too dangerous for AI to study.



Manga Productions to Publish 'Nioh 3' in MENA with Arabic Localization

“Nioh 3” marks the latest chapter in the acclaimed dark samurai action RPG series. (SPA)
“Nioh 3” marks the latest chapter in the acclaimed dark samurai action RPG series. (SPA)
TT
20

Manga Productions to Publish 'Nioh 3' in MENA with Arabic Localization

“Nioh 3” marks the latest chapter in the acclaimed dark samurai action RPG series. (SPA)
“Nioh 3” marks the latest chapter in the acclaimed dark samurai action RPG series. (SPA)

Manga Productions, a subsidiary of Mohammed bin Salman Foundation (Misk), announced on Tuesday its partnership with KOEI TECMO GAMES as the official publisher of the highly anticipated action title “Nioh 3” in the Middle East and North Africa region. The game is scheduled for a worldwide simultaneous launch in early 2026 on PlayStation 5 and Steam.

In a statement, Manga Production said “Nioh 3” marks the latest chapter in the acclaimed dark samurai action RPG series, celebrated for its unique blend of Japanese mythology and intense combat. With over eight million units sold worldwide, the Nioh series has established itself as a global favorite.

This new installment adopts an open field environment and a new battle system that allows players to switch between the two fighting styles "Samurai" and "Ninja" in real time during combat.

As part of the collaboration, Manga Productions will lead the Arabic localization, marketing, and regional publishing efforts. In line with the company's mission to empower local talent, Manga Productions will involve Saudi creatives in the localization process, ensuring a culturally resonant and high-quality experience for Arabic-speaking players.

Manga Productions CEO Dr. Essam Bukhary said the launch of “Nioh 3” with Arabic support for gamers in the region is another step toward delivering world-class experiences while empowering Saudi talent throughout every stage of development.

“The trust we've earned from our global partners reflects Manga Productions' strong capabilities in publishing, distribution, and marketing, as well as our continued success in delivering high-quality, culturally relevant content to audiences across the region,” he stated.

Manga Productions and KOEI TECMO GAMES remain committed to delivering high-quality entertainment that reflects the growing passion and potential of the gaming community across the Middle East.

KOEI TECMO GAMES President and COO Hisashi Koinuma stressed: “After the positive reception of DYNASTY WARRIORS: ORIGINS, we're excited to deepen our partnership with Manga Productions to bring Nioh 3 to Arabic-speaking audiences, fully localized and tailored to their expectations.”

Manga Productions Business Development and Content Licensing Director Eng. Abdulaziz Alnaghmoosh said: “Following our collaboration on DYNASTY WARRIORS: ORIGINS, which was praised for delivering an Arabic experience that felt original rather than translated, Nioh 3 is our next step in raising that standard.”

“We're committed to offering players a seamless, fully localized journey that feels like it was made for them from day one of the worldwide simultaneous launch,” he remarked.