The ARC Prize Foundation, led by François Chollet, has introduced ARC-AGI-3, the latest version of its interactive reasoning benchmark designed to test general intelligence in AI systems.
According to details on the official page at ARC Prize, the new benchmark focuses on tasks that humans can typically solve immediately, while current AI systems struggle significantly.
Previous versions of the ARC benchmark saw rapid improvements as labs trained models specifically for the test, with scores rising from low single digits to around 50% on earlier iterations.
ARC-AGI-3 is designed to reset that progress and better evaluate whether models can genuinely reason rather than optimize for specific benchmarks.
The release highlights a key gap between current AI systems and human-level reasoning. While models have made rapid progress in language and coding tasks, benchmarks like ARC-AGI-3 suggest that general problem-solving remains a major challenge.
For the industry, it reinforces the idea that scaling alone may not be sufficient, and that new approaches to reasoning and learning may be required to move closer to general intelligence.