Navigating ARC-AGI: From Zero to One

Part I: Foundations

The Philosophy and Design of ARC-AGI

Example of an ARC Task

Training Examples

Input

→

Output

Input

→

Output

Test

Input

→

Output?

🤔 Can you find the pattern?

The Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) is more than a benchmark; it is the manifestation of a specific, rigorous philosophy about the nature of intelligence itself. Introduced by François Chollet, it was designed to address a fundamental flaw in how the AI community measured progress.

Defining Intelligence: Beyond Skill, Towards Skill-Acquisition

The central tenet of ARC-AGI is that true, general intelligence is not demonstrated by the possession of a specific skill, but by the efficiency of acquiring new skills when faced with novel problems. This stands in stark contrast to many traditional AI benchmarks which measure performance on tasks that can be mastered through extensive training on massive datasets.

In such cases, high performance can be "bought" with sufficient data and compute, masking the system's underlying ability to generalize and adapt. An AI that achieves superhuman performance at Go has mastered Go; it has not necessarily become more intelligent in a general sense.

Chollet formalizes this concept by defining intelligence as a measure of a system's skill-acquisition efficiency over a given scope of tasks, taking into account its prior knowledge, experience, and the difficulty of generalization. ARC-AGI is the concrete application of this definition. Each task is unique and designed to be unsolvable through mere memorization or pattern matching against a training set.

To measure this skill acquisition in a controlled way, every puzzle adheres to a consistent structure. This structure, the ARC Task Format, presents a small number of examples to learn from.

Core Knowledge Priors: The Bedrock of Fair Comparison

To create a fair and meaningful comparison between human and artificial intelligence, ARC-AGI is meticulously designed to test fluid intelligence—the ability to reason, adapt, and solve novel problems—rather than crystallized intelligence, which relies on accumulated, domain-specific knowledge and cultural learning.
This distinction is critical. A benchmark that required knowledge of historical facts or the English language would unfairly favor systems (and humans) with specific pre-training, turning the test into a measure of prior exposure rather than innate reasoning ability.

ARC-AGI circumvents this by designing tasks that are solvable using only a minimal set of Core Knowledge Priors. These are fundamental, universally shared cognitive building blocks that are either innate or acquired very early in development.

Key Concept: Core Knowledge Priors

🔵 Objectness

The ability to perceive a scene in terms of discrete objects with properties like cohesion (objects move as wholes) and persistence (objects don't randomly appear or disappear).

📐 Basic Topology & Geometry

Intuitive understanding of connectivity, symmetry, inside/outside relationships, and distance.

🔢 Elementary Number Sense

Simple counting and basic integer arithmetic.

🎯 Goal-Directedness

The notion that actions are taken to achieve goals.

By restricting the required knowledge to these universally accessible primitives, ARC-AGI isolates the capacity for generalization and ensures that success reflects a system's intrinsic ability to learn, reason, and adapt. The public training set is explicitly curated to expose a test-taker to all the Core Knowledge priors needed to solve the evaluation tasks, effectively serving as a "tutorial" for the conceptual language of the ARC universe.

The Evolution to ARC-AGI-2: Raising the Bar

The introduction of ARC-AGI-2 in 2025 marks a critical evolution, driven by the progress and observed failure modes of AI systems on the original dataset. While powerful AI systems could achieve high scores on ARC-AGI-1, often through brute-force search, ARC-AGI-2 was designed to be less susceptible to these methods. Furthermore, it was specifically created to probe known weaknesses in modern AI reasoning systems.

New Conceptual Hurdles in ARC-AGI-2

Based on the failures of frontier AI models, ARC-AGI-2 introduces tasks that test for new, more complex reasoning abilities:

Symbolic Interpretation: Tasks where visual symbols must be interpreted as having semantic meaning beyond their shape, such as a shape representing an action.
Compositional Reasoning: Tasks that require discovering and applying multiple, interacting rules simultaneously.
Contextual Rule Application: Tasks where the correct rule to apply depends on the specific context within the grid, moving beyond superficial global patterns.

A Comparison of ARC-AGI-1 and ARC-AGI-2
Feature	ARC-AGI-1	ARC-AGI-2	Rationale for Change
Launch Year	2019	2025	To address limitations and challenge modern AI systems.
Primary Target	Deep Learning (Memorization)	Frontier AI Reasoning Systems	To stay ahead of AI progress and target new, complex reasoning failures.
Brute-Force Susceptibility	High	Low (by design)	To ensure scores reflect intelligent adaptation, not just computational power.
Key AI Challenges	Generalization, basic abstraction.	Symbolic Interpretation, Compositional Reasoning, Contextual Rule Application.	To probe specific, observed weaknesses in state-of-the-art reasoning systems.
Frontier AI Performance	High (e.g., ~75% for o3-preview)	Very Low (e.g., <5% for o3-preview)	To create a wider "signal bandwidth" to differentiate AI capabilities.

The ARC-AGI Ecosystem: Datasets and Evaluation

Successfully navigating the ARC-AGI challenge requires a firm grasp of its practical ecosystem, which includes a structured set of datasets, specific evaluation protocols, and a vibrant community with essential resources.

Navigating the ARC Datasets

The ARC-AGI data is partitioned into several distinct sets, each with a specific purpose. Using them correctly is crucial for both development and fair evaluation.

Overview of ARC-AGI-2 Datasets
Dataset Name	Number of Tasks	Purpose	Access	Key Considerations
Public Training	~1,000	Training algorithms, learning Core Knowledge priors.	Public	Contains easier, "curriculum-style" tasks. Use freely for development.
Public Evaluation	120	Final local evaluation of an algorithm.	Public	Do not use for iterative development. Treat as a one-shot evaluation.
Semi-Private Evaluation	120	Powers the public leaderboard on arcprize.org.	Private (Kaggle)	Used to test both open and closed-source models.
Private Evaluation	120	Official ranking for the Kaggle prize competition.	Private (Kaggle)	The ultimate test of generalization. No internet access allowed.

Understanding the Rules of the Game

Competition Rules

Evaluation Metric (pass@k): The official scoring metric is pass@k, which measures the percentage of tasks solved within k attempts. For the ARC Prize, k=2.
Kaggle Environment: All prize-eligible submissions must run within a standardized Kaggle Notebook environment with no internet access and strict runtime/hardware limits.
Open Source Requirement: To be eligible for prize money, teams must open-source their complete, reproducible solution under a permissive license.

Part II: Core Methodologies

The Evolution of ARC-AGI Approaches

⚙️

Program Synthesis

Infer explicit rules from examples

→

🧠

Neural Networks

Guide search with learned intuition

→

🚀

Test-Time Adaptation

Adapt dynamically to each task

Program Synthesis and Domain-Specific Languages (DSLs)

One of the most natural and historically significant approaches to ARC is program synthesis. This paradigm directly tackles the core of the challenge: inferring a general rule from examples. Program Synthesis, also known as Inductive Programming, is the task of automatically generating a computer program that meets a given high-level specification. In the context of ARC, the specification is the set of demonstration pairs.

The goal is to find a program, P, that correctly transforms each training input grid into its corresponding output grid. If such a program is found, it is assumed to represent the underlying rule of the task and can then be applied to the test input grid to generate a solution.

This approach is fundamentally inductive: it first infers a general, abstract rule (the program) from specific examples, and only then executes that rule to produce a specific prediction. This contrasts with transductive methods, which predict the output directly from the examples without necessarily forming an explicit, reusable program. Program synthesis was the dominant strategy in the early days of ARC, with the 2020 Kaggle competition winner employing these techniques.

The Power of DSLs

The primary challenge in program synthesis is the vastness of the search space. A Domain-Specific Language (DSL) is essential. A DSL is a small, specialized programming language designed for ARC, consisting of a curated set of functions, or primitives, that perform common grid operations. A good DSL must be expressive enough to solve tasks while simple enough to keep the search space manageable.

Common DSL Primitives:

rotate_grid find_objects mirror_object count_colors draw_line shift_object

DSL Program Example

solve_5521c0d9.py

# Simplified representation of the solver for task 5521c0d9 from Hodel's arc-dsl
def solve_5521c0d9(I):
    # 1. Extract all non-background objects from the input grid 'I'.
    objs = dsl.objects(I, univalued=True, diagonal=False, without_background=True)

    # 2. Merge all extracted objects into a single 'foreground' object.
    foreground = dsl.merge(objs)

    # 3. Create a new grid by removing the foreground, leaving only the background.
    empty_grid = dsl.cover(I, foreground)

    # 4. Create a function 'offset_getter' that calculates an upward shift vector
    #    equal to an object's height. This is done by composing three functions:
    #    height -> invert -> toivec (get height, negate it, convert to vector).
    offset_getter = dsl.chain(dsl.toivec, dsl.invert, dsl.height)

    # 5. Create a function 'shifter' that takes an object and moves it.
    #    The 'fork' primitive applies the 'shift' operation, using the object
    #    itself as the first argument and the result of 'offset_getter(object)'
    #    as the second argument.
    shifter = dsl.fork(dsl.shift, dsl.identity, offset_getter)

    # 6. Apply the 'shifter' function to every object in the 'objs' list
    #    and merge the results into a single object of shifted shapes.
    shifted = dsl.mapply(shifter, objs)

    # 7. Paint the final 'shifted' object onto the 'empty_grid'.
    O = dsl.paint(empty_grid, shifted)

    return O

Combined primitives in action

This example beautifully illustrates the paradigm. The solution is not a monolithic neural network but an interpretable, multi-step program. Each line applies a well-defined primitive from the DSL. The program first deconstructs the input grid into objects (dsl.objects), then computes a transformation for them (the shifter function), applies this transformation (dsl.mapply), and finally reconstructs the output grid (dsl.paint). A program synthesis system would need to find this specific sequence of seven function calls out of a vast number of possibilities.

The Power of Search: From Brute Force to Neurally-Guided

Once a DSL is defined, the core problem becomes one of search. The system must find the correct sequence of DSL primitives that solves the task.

The Combinatorial Explosion

Even with a constrained DSL of 100 primitives, the number of possible programs grows exponentially:

100

Length 1

10K

Length 2

1M

Length 3

100M

Length 4

Beyond Brute Force: Intelligent Search Strategies

Modern ARC solvers employ sophisticated search algorithms to navigate the vast program space efficiently:

🎯 Classic & Modern Search

Monte Carlo Tree Search (MCTS): Balances exploration and exploitation to find promising program paths.
Adaptive Branching MCTS (AB-MCTS): An advanced variant from Sakana AI that adaptively decides whether to search deeper (refine) or wider (explore).
Beam Search & Heuristic Search: Methods that use rules of thumb or maintain multiple candidate programs to guide search toward likely solutions.

🧠 Neural Guidance

GridCoder: Uses a Transformer to predict the most likely sequence of DSL primitives, guiding the search probabilistically.
Execution-Guided Search: A neural network learns a distance metric between grids to evaluate which intermediate step is "closest" to the goal.
Learning Program Space (LPS): The main GridCoder approach where a model predicts the final program directly.

The most significant advance is using neural networks to guide the search process. This moves from a model of "search as enumeration" to "search as learned intuition," which is far more efficient and mirrors human problem-solving.

Test-Time Adaptation: The Modern Paradigm

While program synthesis was dominant early on, the most significant breakthroughs in 2024 came from Test-Time Adaptation (TTA). This strategy, where a model dynamically adapts itself at the moment of inference using the task's own demonstration examples, was a necessary component of every top-performing solution.

TTA is a broad category that includes two main sub-strategies: Test-Time Scaling (TTS), which allocates more compute without changing model weights, and Test-Time Training (TTT), which temporarily fine-tunes the model.

Test-Time Scaling (TTS)

TTS refers to improving performance by allocating more computational resources at inference time, without changing the model's weights. This can range from simple techniques like repeated sampling to more complex search procedures like chain-of-thought synthesis or Sakana AI's advanced Adaptive Branching Monte Carlo Tree Search (AB-MCTS).

Test-Time Training (TTT)

TTT is a powerful technique where a model's parameters are temporarily updated via gradient descent at inference time. The model is briefly fine-tuned on the few demonstration pairs of the specific task it is trying to solve. This was pioneered for ARC by researchers at MIT and became the basis for several top-scoring 2024 solutions.

The Test-Time Training (TTT) Workflow

1

Get Task

Receive a novel ARC task with a few train/test examples.

2

Augment Data

Expand the small demo set using symmetries (rotations, flips, color swaps) to create a temporary training set.

3

Train LoRA

Rapidly fine-tune a small LoRA adapter on the augmented data, leaving the base model frozen for efficiency.

4

Infer & Ensemble

Predict the output. Often done under multiple augmentations and combined via majority vote for robustness.

The Duality of Induction and Transduction

A fundamental duality in problem-solving strategies has become apparent in ARC research, formalized in the prize-winning paper by Li et al. This duality mirrors the concepts of System 1 and System 2 thinking, a popular model in cognitive science used to describe the two paths of human reasoning. Understanding this is key to building a top-tier solver, as the best solutions are ensembles that combine both approaches.

🔍 Induction (Program Synthesis)

The System 2, deliberate reasoning path. The goal is to first infer a latent, explicit program or function f that fully explains the transformation in the training examples. This program f is then applied to the test input x_test to get the prediction y_test = f(x_test). The classic DSL-based search methods described earlier are inductive.

Excels at: Tasks requiring precision, multi-step logic, compositionality, and explicit computation.

⚡ Transduction (Direct Prediction)

The System 1, intuitive path. The goal is to directly predict the test output y_test by considering the training examples (x_train, y_train) and the test input x_test all at once, without necessarily creating an explicit, intermediate program. The LLM-based Test-Time Training approaches described earlier are primarily transductive.

Excels at: Tasks relying on "fuzzy" perception, pattern completion, and holistic transformations.

The very best ARC solutions are ensembles that combine both inductive and transductive methods, mirroring the dual-process models of human cognition.

Part III: Practical Guide

Your First ARC Solver: A Step-by-Step Tutorial

This section provides a hands-on tutorial for building a simple, yet complete, ARC solver in Python. We'll use a minimal DSL and simple brute-force search to demonstrate the core logic of the inductive programming paradigm.

🐍 Python 🔧 DSL 🔍 Search 📊 Visualization

💻

Building Your First ARC Solver

What We'll Build

A minimal Domain-Specific Language (DSL)
A brute-force search algorithm
Grid visualization tools
A complete end-to-end solver

Learning Objectives

Understand the program synthesis approach
Learn how DSLs constrain search space
Implement search and verification logic
Create submission-ready output

💡 Pro Tip: This tutorial demonstrates the core concepts. Real competitive solvers use much larger DSLs, smarter search algorithms, and neural guidance!

1. Setting Up Your Environment

First, let's prepare your development environment with the necessary data and libraries.

terminal

# Clone the ARC-AGI repository for the data
git clone https://github.com/fchollet/ARC-AGI.git

# Install necessary libraries
pip install numpy matplotlib

# Create project structure
mkdir arc_solver
cd arc_solver

Project Structure

arc_solver/
├── ARC-AGI/          # The cloned repository
├── solver.py         # Our main solver script
└── visualize.py      # A utility for plotting grids

2. Loading and Visualizing Data

Visualizing the tasks is essential for understanding and debugging. Create a file visualize.py:

visualize.py

# In visualize.py
import matplotlib.pyplot as plt
from matplotlib import colors
import numpy as np

# Define the 10 official ARC colors
ARC_COLORMAP = colors.ListedColormap([
    '#000000', '#0074D9', '#FF4136', '#2ECC40', '#FFDC00',
    '#AAAAAA', '#F012BE', '#FF851B', '#7FDBFF', '#870C25'
])

def plot_grid(ax, grid, title=""):
    """Plots a single ARC grid with the official colormap."""
    norm = colors.Normalize(vmin=0, vmax=9)
    ax.imshow(np.array(grid), cmap=ARC_COLORMAP, norm=norm)
    ax.grid(True, which='both', color='white', linewidth=0.5)
    ax.set_xticks(np.arange(-0.5, len(grid[0]), 1), minor=True)
    ax.set_yticks(np.arange(-0.5, len(grid), 1), minor=True)
    ax.set_xticklabels([])
    ax.set_yticklabels([])
    ax.set_title(title)

def plot_task(task):
    """Plots all training and test pairs for a given ARC task."""
    num_train = len(task['train'])
    num_test = len(task['test'])
    num_total = num_train + num_test
    fig, axs = plt.subplots(2, num_total, figsize=(3 * num_total, 6))
    
    for i, pair in enumerate(task['train']):
        plot_grid(axs[0, i], pair['input'], f"Train {i} Input")
        plot_grid(axs[1, i], pair['output'], f"Train {i} Output")

    for i, pair in enumerate(task['test']):
        plot_grid(axs[0, num_train + i], pair['input'], f"Test {i} Input")
        if 'output' in pair:
            plot_grid(axs[1, num_train + i], pair['output'], f"Test {i} Output")
        else:
            axs[1, num_train + i].axis('off')
            axs[1, num_train + i].set_title(f"Test {i} Output (Predict)")
    
    plt.tight_layout()
    plt.show()

You can now load and view a task in your main script, solver.py:

solver.py

# In solver.py
import json
from visualize import plot_task

def load_task(task_path):
    with open(task_path, 'r') as f:
        return json.load(f)

# Example usage
task_file = 'ARC-AGI/data/training/007bbfb7.json'
task = load_task(task_file)
# plot_task(task)  # Uncomment to visualize

3. Building a Simple DSL Solver

Now, let's build the core solver logic.

Step 1: Define a Minimal DSL

Create a few simple functions that operate on grids (represented as NumPy arrays).

solver.py

# In solver.py
import numpy as np

def dsl_rotate_90(grid):
    return np.rot90(grid, 1)

def dsl_flip_horizontal(grid):
    return np.fliplr(grid)

def dsl_flip_vertical(grid):
    return np.flipud(grid)

# Our DSL is a dictionary mapping function names to functions
DSL = {
    'rotate_90': dsl_rotate_90,
    'flip_h': dsl_flip_horizontal,
    'flip_v': dsl_flip_vertical,
}

Step 2 & 3: Implement a Search and Verification Loop

We'll use a simple brute-force search that tries all sequences of DSL functions up to a certain length.

solver.py

# In solver.py
from itertools import product

def apply_program(grid, program):
    """Applies a sequence of DSL functions to a grid."""
    current_grid = np.array(grid)
    for func_name in program:
        current_grid = DSL[func_name](current_grid)
    return current_grid.tolist()

def find_program(task, max_depth=3):
    """Searches for a program that solves the task."""
    train_pairs = task['train']
    
    # Generate all possible programs up to max_depth
    for depth in range(1, max_depth + 1):
        for program_tuple in product(DSL.keys(), repeat=depth):
            program = list(program_tuple)
            is_solution = True
            # Verify the program against all training pairs
            for pair in train_pairs:
                input_grid = pair['input']
                expected_output = pair['output']
                predicted_output = apply_program(input_grid, program)
                
                if predicted_output != expected_output:
                    is_solution = False
                    break
            
            if is_solution:
                print(f"Found solution program: {program}")
                return program
    
    print("No solution found.")
    return None

4. Step 4 & 5: Apply to Test Input and Format for Submission

Finally, if a program is found, apply it to the test inputs and prepare the submission.json file.

solver.py

# In solver.py
def solve_task(task):
    """Finds a program and applies it to test inputs."""
    program = find_program(task)
    if program is None:
        return # Return empty predictions if no solution found
    
    predictions = []
    for pair in task['test']:
        test_input = pair['input']
        predicted_output = apply_program(test_input, program)
        predictions.append(predicted_output)
    return predictions

def main():
    task_file = 'ARC-AGI/data/training/007bbfb7.json' # A simple rotation task
    task_id = task_file.split('/')[-1].replace('.json', '')
    task = load_task(task_file)
    
    predictions = solve_task(task)
    
    # Format for submission (simplified for one task)
    submission = {}
    if predictions:
        # ARC Prize allows multiple attempts, here we just submit one
        submission[task_id] = [{'attempt_1': pred, 'attempt_2': pred} for pred in predictions]

    with open('submission.json', 'w') as f:
        json.dump(submission, f, indent=4)
    print("submission.json created.")

if __name__ == '__main__':
    main()

📝 Summary

This complete script, while only able to solve very simple tasks, demonstrates the end-to-end workflow: load data, define a language, search for a program that explains the data, apply the program to new inputs, and format the output. It provides a solid foundation upon which to build more complex DSLs, more intelligent search algorithms, and eventually integrate neural components.

Building a Robust Validation Pipeline

Before writing a single line of solver code, the most important step for any serious competitor is to build a robust local validation pipeline. The goal is to create a setup that allows you to reliably estimate your performance on the hidden private test set without deceiving yourself.

The Golden Rule of Validation

Your solver should be developed only on the public training set. The public evaluation set must be treated as a one-shot, holdout set for final validation.

Mimic Kaggle: Your pipeline should strictly separate the public training and public evaluation datasets. Your algorithm should never see the evaluation tasks during its development phase.
Avoid Data Leakage: Repeatedly modifying your algorithm based on its score on the evaluation set, or manually inspecting those tasks to guide development, constitutes data leakage. This will lead to an inflated, unreliable local score that will not translate to the private leaderboard.
One-Shot Execution: To get your best estimate of true performance, run your final, trained solver on the entire public evaluation set in a single execution. If your solution is computationally expensive, you can build confidence by testing on a random sample of tasks, holding out the rest for a final validation run before a full execution.

Deconstructing the Champions: Analysis of Winning Solutions

To move from a simple solver to a competitive one, it is essential to study the strategies of those who have reached the top of the leaderboards. The open-source nature of the ARC Prize provides an unprecedented opportunity to deconstruct the winning solutions from the 2024 competition.

Selected ARC Prize 2024 Winners and Approaches
Rank (Category)	Team / Lead Author	Score	Core Approach	Key Innovation(s)
1st (Kaggle)	the ARChitects	53.5%	LLM-based TTT (Transduction)	Custom DFS sampling, "Product of Experts" scoring.
2nd (Kaggle)	G. Barbadillo	40.0%	Ensemble (Induction + Transduction)	Hybrid solver combining DSL search and LLM prediction.
2nd (Paper)	Akyürek et al.	61.9% (public)	LLM-based TTT (Transduction)	Foundational method for TTT on ARC using LoRA.
Runner-Up (Paper)	Simon Ouellette	N/A	Neurally-Guided Program Synthesis (Induction)	GridCoder: using a Transformer to guide DSL search.

🏆 1st Place: the ARChitects

Core Approach: LLM-based TTT (Transduction)

Advanced TTT with custom sampling and "Product of Experts" scoring

→

📚 2nd Paper: Akyürek et al.

Core Approach: The TTT Pioneers (Transduction)

Foundational TTT methodology with LoRA and data augmentation

→

🧠 Runner-Up: Simon Ouellette (GridCoder)

Core Approach: Neurally-Guided Program Synthesis (Induction)

Specialized Transformer guiding DSL search for efficient induction

→

Charting Your Course: Strategies for ARC Prize 2025

Armed with an understanding of ARC's philosophy, methodologies, and winning strategies, you can now chart a course for tackling the ARC Prize 2025. Success will require a combination of solid engineering, strategic thinking, and novel ideas.

Choosing Your Path: Hybridization and Specialization

The results from 2024 send a clear message: no single approach currently solves all ARC tasks. The state-of-the-art is a hybrid. A highly effective strategy for a new competitor would be:

1

Build a Strong Transductive Baseline

Start by implementing a robust Test-Time Training (TTT) solver based on the work of the ARChitects and Akyürek et al.

2

Develop a Specialized Inductive Solver

Concurrently, build or adapt a DSL-based program synthesis solver. This could be based on Michael Hodel's DSL or a neurally-guided approach like GridCoder.

3

Create an Ensemble

Design a meta-solver that runs both your transductive and inductive systems on each task and develops heuristics to choose which solution to submit.

The Frontier: Where Do New Ideas Come From?

Simply re-implementing 2024 solutions is unlikely to win the Grand Prize on ARC-AGI-2. Your strategic focus should be on creating novel solutions for the known weaknesses of today's systems—the very challenges ARC-AGI-2 was built to test:

🔍 Symbolic Interpretation

Understanding that pixels can represent an action or concept, rather than just being a pattern to transform.

🧩 Compositional Reasoning

Discovering and applying multiple, interacting rules simultaneously, especially when those rules interact with each other.

🎯 Contextual Rule Application

Recognizing the context that determines which rule to apply, moving beyond superficial global patterns.

🚀 Ready to Start Your ARC Journey?

Everything you need to begin competing in the ARC Prize 2025 and contributing to the future of artificial general intelligence.

📚 Essential Resources

ARC Prize Website: arcprize.org

Official Discord: ARC Prize Discord

GitHub Repository: fchollet/ARC-AGI

Kaggle Competition: ARC Prize 2025

✅ Competition Checklist

Understand the pass@2 metric

Build robust validation pipeline

Study winning solutions

Prepare for open-source requirement

🎯 Start Building Your ARC Solver Today

The Path Forward

The Abstraction and Reasoning Corpus is far more than a conventional AI benchmark. It is a challenge, a philosophy, and a compass for the field of AGI research. It posits that true intelligence lies not in accumulated skill but in the efficient acquisition of new skills in the face of novelty.

By engaging with the open-source code of past champions, participating in the vibrant research community, and focusing on the unsolved frontiers, a dedicated researcher has all the tools necessary to not only compete in—and perhaps even claim the Grand Prize for solving—the ARC Prize, but to contribute meaningfully to the collective, open pursuit of Artificial General Intelligence.

🧠 Ready to start your ARC journey? 🚀

Navigating ARC-AGI: From Zero to One

👋A Note from a Fellow Beginner

What is ARC-AGI?

Novel Puzzles

Skill Acquisition

Core Knowledge

Part I: Foundations

The Philosophy and Design of ARC-AGI

Example of an ARC Task

Defining Intelligence: Beyond Skill, Towards Skill-Acquisition

Core Knowledge Priors: The Bedrock of Fair Comparison

Key Concept: Core Knowledge Priors

🔵 Objectness

📐 Basic Topology & Geometry

🔢 Elementary Number Sense

🎯 Goal-Directedness

The Evolution to ARC-AGI-2: Raising the Bar

New Conceptual Hurdles in ARC-AGI-2

The ARC-AGI Ecosystem: Datasets and Evaluation

Navigating the ARC Datasets

Understanding the Rules of the Game

Competition Rules

Part II: Core Methodologies

The Evolution of ARC-AGI Approaches

Program Synthesis

Neural Networks

Test-Time Adaptation

Program Synthesis and Domain-Specific Languages (DSLs)

The Power of DSLs

Common DSL Primitives:

DSL Program Example

The Power of Search: From Brute Force to Neurally-Guided

The Combinatorial Explosion

Beyond Brute Force: Intelligent Search Strategies

🎯 Classic & Modern Search

🧠 Neural Guidance

Test-Time Adaptation: The Modern Paradigm

Test-Time Scaling (TTS)

Test-Time Training (TTT)

The Test-Time Training (TTT) Workflow

Get Task

Augment Data

Train LoRA

Infer & Ensemble

The Duality of Induction and Transduction

🔍 Induction (Program Synthesis)

⚡ Transduction (Direct Prediction)

Part III: Practical Guide

Your First ARC Solver: A Step-by-Step Tutorial

Building Your First ARC Solver

What We'll Build

Learning Objectives

1. Setting Up Your Environment

Project Structure

2. Loading and Visualizing Data

3. Building a Simple DSL Solver

Step 1: Define a Minimal DSL

Step 2 & 3: Implement a Search and Verification Loop

4. Step 4 & 5: Apply to Test Input and Format for Submission

📝 Summary

Building a Robust Validation Pipeline

The Golden Rule of Validation

Deconstructing the Champions: Analysis of Winning Solutions

🏆 1st Place: the ARChitects

📚 2nd Paper: Akyürek et al.

🧠 Runner-Up: Simon Ouellette (GridCoder)

Charting Your Course: Strategies for ARC Prize 2025

Choosing Your Path: Hybridization and Specialization

Build a Strong Transductive Baseline

Develop a Specialized Inductive Solver

Create an Ensemble

The Frontier: Where Do New Ideas Come From?

🔍 Symbolic Interpretation

🧩 Compositional Reasoning

🎯 Contextual Rule Application

🚀 Ready to Start Your ARC Journey?

📚 Essential Resources

✅ Competition Checklist

The Path Forward