Towards AGI

Artificial general intelligence (AGI) may be defined as a subset of AI. The term was used by Ben Goertzel and Cassio Pennachin in their 2007 book to describe “AI systems that possess a reasonable degree of self-understanding and autonomous self-control, and have the ability to solve a variety of complex problems in a variety of contexts, and to learn to solve new problems that they didn’t know about at the time of their creation”. Despite the vagueness of reasonable degree, we should admit that self-understanding and autonomous self-control have not been reached, nor they are on the horizon.

In 2019, Francois Chollet proposed to relate the measure of intelligence to the ability of a system to generalize. In the same paper, he presented a benchmark called Abstraction and Reasoning Corpus (ARC) designed to measure broad generalization, that is the ability to adapt to “unknown unknowns across a broad category of related tasks”. ARC represented a discontinuity in the evaluation of AI systems and immediately exposed some limitations of the existing large language models (LLM). ARC was later relabeled ARC-AGI and eventually ARC-AGI-1 with a new dedicated web site; in 2025, the ARC-AGI-2 pushed the difficulty even further, requiring more, perhaps subtler, forms of reasoning. ARC-AGI-3, set to be launched in 2026, is going to add interactive videogame-like environments to the benchmarks.

It must be noted that, despite its undeniable difficulty, solving the ARC-AGI benchmarks does not require true broad generalization, at least as most scholars, including Chollet, used to describe it. Such an ability, for instance, would enable a domestic robot to enter a random kitchen and prepare a cup of coffee (The so-called “Wozniak’s coffee cup test”, PC World, 2007) – a task remarkably more difficult than all ARC-AGI benchmarks, where the problems are well-defined and the types of unknowns are limited.

However, if AGI is redefined as the ability to solve problems like the ones in the three ARC-AGI benchmarks, then attaining it is an achievable, yet ambitious goal. Such benchmarks allow us to disregard practicalities like real-world interaction, image recognition, or anything else related to physical activities and objects; moreover, all tasks can be performed exploiting non-trivial, but still relatively simple, procedures. Thus, solving them requires only the ability to encode the scenario in a suitable format and write a program that makes use the input data.

The thesis proposes to exploit Evolutionary Computation for tackling the ARC-AGI benchmarks.

Last update: