New review paper argues code is how AI agents think and act, not just what they produce

A new review paper redefines how AI agents operate, arguing that code is not merely a product but the fundamental medium through which they think and act. The study synthesizes recent research to position code generation and execution as the core cognitive process for modern AI systems, shifting focus from output to method.

This perspective challenges the conventional view of AI as a tool that produces text, images, or decisions. Instead, it frames code as the internal language of reasoning, planning, and action.

The Core Argument: Code as Cognition

The paper contends that AI agents “think” by writing and executing code internally, even when the final output is natural language. This process mirrors human cognition where intermediate steps and logical structures underpin decisions.

Key findings from the review include:

  • Code enables structured reasoning by breaking complex tasks into verifiable, sequential steps. Agents generate intermediate code to simulate scenarios, test hypotheses, and correct errors before producing a final response.
  • Action and thought are unified through code execution. An AI agent that writes a Python script to calculate a result is not just producing output; it is performing a computational act equivalent to human problem-solving.
  • Code as a transparency layer allows researchers to audit and debug agent behavior. Unlike opaque neural network weights, code traces provide an interpretable record of how an agent arrived at a conclusion.

Implications for AI Development

The review has immediate consequences for how researchers design and evaluate AI agents. If code is the medium of thought, then improving code generation quality directly improves agent intelligence.

“The paper argues that focusing on code rather than natural language output could unlock more reliable, verifiable, and controllable AI systems.”

This shift would require new benchmarks that measure an agent’s ability to generate correct, efficient, and self-correcting code, rather than simply matching human-written answers.

Why This Matters Now

The argument arrives as AI agents increasingly perform autonomous tasks like software development, data analysis, and web navigation. These systems already rely on code generation tools such as large language models fine-tuned for programming.

The paper warns that ignoring the centrality of code risks misunderstanding both the capabilities and the limitations of current AI.

Background and Context

The review paper synthesizes dozens of recent studies from top AI conferences and labs. It builds on the observation that many advanced AI models (including GPT-4, Claude, and Gemini) show dramatic performance gains when allowed to generate and execute code during reasoning.

Researchers have noted that agents that write and run code outperform those that reason purely in natural language on tasks requiring arithmetic, logic, or multi-step planning.

The authors propose a new framework: instead of treating code as an output modality, AI systems should be conceived as “code-first” agents. This means training, evaluation, and architecture decisions should prioritize code fluency.

Critical Reception and Open Questions

Not all experts agree. Some argue that focusing on code may overfit AI to programming tasks while neglecting broader cognitive abilities like common sense or empathy. Others question whether code generation is truly “thinking” or merely a sophisticated simulation.

The review acknowledges these debates but maintains that even if code is not the only mode of thought, it is the most tractable for current AI research.

What are your thoughts on this? I’d love to hear about your own experiences in the comments below.

Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.