AI Experts Ready ‘Humanity’s Last Exam’ to Stump Powerful Tech

Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
TT

AI Experts Ready ‘Humanity’s Last Exam’ to Stump Powerful Tech

Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)
Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI" in this illustration taken, February 19, 2024. (Reuters)

A team of technology experts issued a global call on Monday seeking the toughest questions to pose to artificial intelligence systems, which increasingly have handled popular benchmark tests like child's play.

Dubbed "Humanity's Last Exam," the project seeks to determine when expert-level AI has arrived. It aims to stay relevant even as capabilities advance in future years, according to the organizers, a non-profit called the Center for AI Safety (CAIS) and the startup Scale AI.

The call comes days after the maker of ChatGPT previewed a new model, known as OpenAI o1, which "destroyed the most popular reasoning benchmarks," said Dan Hendrycks, executive director of CAIS and an advisor to Elon Musk's xAI startup.

Hendrycks co-authored two 2021 papers that proposed tests of AI systems that are now widely used, one quizzing them on undergraduate-level knowledge of topics like US history, the other probing models' ability to reason through competition-level math. The undergraduate-style test has more downloads from the online AI hub Hugging Face than any such dataset.

At the time of those papers, AI was giving almost random answers to questions on the exams. "They're now crushed," Hendrycks told Reuters.

As one example, the Claude models from the AI lab Anthropic have gone from scoring about 77% on the undergraduate-level test in 2023, to nearly 89% a year later, according to a prominent capabilities leaderboard.

These common benchmarks have less meaning as a result.

AI has appeared to score poorly on lesser-used tests involving plan formulation and visual pattern-recognition puzzles, according to Stanford University’s AI Index Report from April. OpenAI o1 scored around 21% on one version of the pattern-recognition ARC-AGI test, for instance, the ARC organizers said on Friday.

Some AI researchers argue that results like this show planning and abstract reasoning to be better measures of intelligence, though Hendrycks said the visual aspect of ARC makes it less suited to assessing language models. "Humanity’s Last Exam" will require abstract reasoning, he said.

Answers from common benchmarks may also have ended up in data used to train AI systems, industry observers have said. Hendrycks said some questions on "Humanity's Last Exam" will remain private to make sure AI systems' answers are not from memorization.

The exam will include at least 1,000 crowd-sourced questions due November 1 that are hard for non-experts to answer. These will undergo peer review, with winning submissions offered co-authorship and up to $5,000 prizes sponsored by Scale AI.

"We desperately need harder tests for expert-level models to measure the rapid progress of AI," said Alexandr Wang, Scale's CEO.

One restriction: the organizers want no questions about weapons, which some say would be too dangerous for AI to study.



Latest US Strike on China's Chips Hits Semiconductor Toolmakers

Flags of China and US are displayed on a printed circuit board with semiconductor chips, in this illustration picture taken February 17, 2023. REUTERS/Florence Lo/Illustration/File Photo
Flags of China and US are displayed on a printed circuit board with semiconductor chips, in this illustration picture taken February 17, 2023. REUTERS/Florence Lo/Illustration/File Photo
TT

Latest US Strike on China's Chips Hits Semiconductor Toolmakers

Flags of China and US are displayed on a printed circuit board with semiconductor chips, in this illustration picture taken February 17, 2023. REUTERS/Florence Lo/Illustration/File Photo
Flags of China and US are displayed on a printed circuit board with semiconductor chips, in this illustration picture taken February 17, 2023. REUTERS/Florence Lo/Illustration/File Photo

The United States on Monday launched its third crackdown in three years on China's semiconductor industry, curbing exports to 140 companies including chip equipment maker Naura Technology Group, among other moves.

The effort to hobble Beijing's chipmaking ambitions also hits Chinese chip toolmakers Piotech and SiCarrier Technology with new export restrictions as part of the package, which also takes aim at shipments of advanced memory chips and more chipmaking tools to China.

The move is one of the Biden administration's last large-scale efforts to stymie China's ability to access and produce chips that can help advance artificial intelligence for military applications, or otherwise threaten US national security.

It comes just weeks before the swearing-in of Republican former president Donald Trump, who is expected to retain many of Biden's tough-on-China measures, according to Reuters.

The package includes curbs on China-bound shipments of high bandwidth memory (HBM) chips, critical for high-end applications like AI training; new curbs on 24 additional chipmaking tools and three software tools; and new export curbs on chipmaking equipment made in countries such as Singapore and Malaysia.

Commerce Secretary Gina Raimondo said the action aims to prevent "China from advancing its domestic semiconductor manufacturing system, which it will use to support its military modernization."

Reuters first reported many companies involved and key details of the plan.

The tool controls will likely hurt Lam Research, KLA and Applied Materials, as well as non-US companies like Dutch equipment maker ASM International .

Among Chinese companies facing new restrictions are nearly two dozen semiconductor companies, two investment companies and over 100 chipmaking tool makers.

The companies include Swaysure Technology Co, SiEn Qingdao, and Shenzhen Pensun Technology Co, work with China's Huawei Technologies, the telecommunications equipment leader once hobbled by US sanctions and now at the center of China's advanced chip production and development.

They will be added to the entity list, which bars US suppliers from shipping to them without first receiving a special license.

Asked about the US curbs, Chinese foreign ministry spokesman Lin Jian said such behaviour undermined the international economic trade order and disrupted global supply chains.

China will take measures to safeguard the rights and interests of its firms, he added at a regular press briefing on Monday.

The Chinese commerce ministry did not immediately respond to a request for comment.

China has stepped up its drive to become self-sufficient in the semiconductor sector in recent years, as the US and other countries have restricted exports of the advanced chips and the tools to make them. However, it remains years behind chip industry leaders like Nvidia in AI chips and chip equipment maker ASML in the Netherlands.

The US also is poised to place additional restrictions on Semiconductor Manufacturing International, China's largest contract chip manufacturer, which was placed on the Entity List in 2020 but with a policy that allowed billions of dollars worth of licenses to ship goods to it to be granted.

For the first time, the US will add three companies that make investments in chips to the entity list. Chinese private equity firm Wise Road Capital, tech firm Wingtech Technology Co and JAC Capital because of their role "in aiding China’s government’s efforts to acquire entities with sensitive semiconductor manufacturing capability critical to the defense industrial bases of the United States and its allies with the objective of relocating these entities to China."

Companies seeking licenses to ship to firms on the Entity List generally get denied.

DUTCH AND JAPANESE EXEMPTED

An aspect of the new package that addresses the foreign direct product rule could hurt some US allies by limiting what their companies can ship to China.

The new rule will expand US powers to curb exports of chipmaking equipment by US, Japanese, and Dutch manufacturers made in other parts of the world to certain chip plants in China.

Equipment made in Israel, Malaysia, Singapore, South Korea and Taiwan is subject to the rule while Japan and the Netherlands will be exempt.

The expanded foreign direct product rule will apply to 16 companies on the entity list that are seen as the most important to China's most advanced chipmaking ambitions. The rule will also lower to zero the amount of US content that determines when certain foreign items are subject to US control. That will allow the US to regulate any item shipped to China from overseas if it contains any US chips.

The new rules are being released after lengthy discussions with Japan and the Netherlands, which, along with the United States, dominate the production of advanced chipmaking equipment.

The United States plans to exempt countries that adopt similar controls, the people said.

Another rule in the package restricts memory used in AI chips that correspond with what is known as "HBM 2" and higher, technology made by South Korea's Samsung and SK Hynix and US-based Micron.

Industry sources expect only Samsung Electronics to be affected. Analysts estimate Samsung generates about 30% of its HBM chip sales from China.

The latest rules are the third major package of chip-related export curbs on China adopted under the Biden administration.

In October 2022, the United States published a sweeping set of controls on sale and manufacture of certain high-end chips that was considered to be the biggest shift in its tech policy toward China since the 1990s.