Anthropic study shows leading AI models racking up millions in simulated smart contract exploits

amu · December 3, 2025, 7:43pm

Anthropic’s latest research reveals a sobering capability among leading AI models: the ability to autonomously identify and exploit vulnerabilities in smart contracts, potentially siphoning millions in simulated cryptocurrency. In a study published by the AI safety organization, researchers developed a rigorous benchmark called the “Smart Contract Hacking Arena” to evaluate how frontier language models perform in adversarial cybersecurity scenarios targeting Ethereum-based smart contracts written in Solidity.

The benchmark consists of 15 real-world-inspired smart contracts, each harboring at least one exploitable vulnerability drawn from historical incidents or common patterns documented in platforms like SWC Registry and Ethereum’s post-mortem analyses. These vulnerabilities span categories such as reentrancy attacks, integer overflows, access control flaws, and oracle manipulation. Crucially, the contracts were deployed on a private Ethereum testnet forked from mainnet, allowing models to interact with them in a realistic blockchain environment without risking real funds. Each contract was seeded with simulated Ether (ETH) ranging from $10,000 to $50,000 equivalents, enabling quantifiable measurement of “stolen” funds upon successful exploitation.

Participating models included Anthropic’s own Claude 3.5 Sonnet, OpenAI’s o1-preview and GPT-4o, Google’s Gemini 1.5 Pro, Meta’s Llama 3.1 405B, and others like Mistral Large 2. The evaluation protocol was methodical: models received a contract’s source code, transaction history, ABI (Application Binary Interface), and relevant blockchain data. They were prompted to act as ethical hackers, tasked with analyzing the code, devising an exploit strategy, generating exploit code in Solidity or JavaScript (via ethers.js), and executing it against the live deployment. Interaction occurred through a custom scaffolding agent that handled transaction submission, gas estimation, and error feedback, mimicking a developer’s workflow.

Results were striking. Claude 3.5 Sonnet emerged as the top performer, successfully exploiting 7 out of 15 contracts and extracting a total of approximately $410,000 in simulated ETH. In one standout case, it drained a reentrancy-vulnerable lending protocol seeded with $50,000 by crafting a malicious contract that recursively withdrew funds before balance updates—a classic attack vector reminiscent of the 2016 DAO hack. Sonnet’s exploits often involved multi-step reasoning: pinpointing flaws like unprotected external calls, forging precise transaction sequences, and optimizing gas usage to evade detection.

OpenAI’s o1-preview followed closely, hacking 6 contracts for about $310,000 in gains. It excelled in complex scenarios, such as manipulating a flawed governance token contract by exploiting an integer underflow to inflate voting power and seize control. GPT-4o managed 5 exploits totaling $270,000, demonstrating strong code generation but occasional lapses in handling nonce management or gas limits. Google’s Gemini 1.5 Pro secured 4 successes worth $180,000, while Meta’s open-source Llama 3.1 405B lagged with 3 exploits at $120,000. Notably, smaller or older models like GPT-4 Turbo and Claude 3 Opus performed poorly, with zero or minimal successes, underscoring the leap in capabilities among the latest frontier systems.

Beyond raw theft totals—which aggregated to over $1.5 million across all models—the study highlighted qualitative insights. Successful exploits frequently required chain-of-thought reasoning: models iteratively refined hypotheses, simulated attack paths mentally, and adapted to testnet feedback. For instance, in an access control vulnerability, Claude 3.5 Sonnet hypothesized an unauthorized minting function, verified it via static analysis, then deployed a payload contract to trigger it. Failures often stemmed from hallucinated code syntax, misread ABIs, or inability to manage Ethereum’s stateful nature, like pending transactions.

Anthropic emphasized that these results do not imply current AIs pose an imminent threat to production blockchains, where mitigations like formal verification, audits, and multisig wallets prevail. Deployments were isolated, and exploits targeted deliberately buggy code. However, the study raises alarms for future risks. As models improve, they could lower the barrier for novice attackers, automate vulnerability discovery at scale, or even target audited contracts with novel zero-days. On the flip side, the same capabilities could bolster defenses: models already show promise in vulnerability detection, with Sonnet identifying flaws in 12 of 15 contracts during analysis phases.

The researchers advocate for “scaffolded” evaluations like this arena to track offensive cybersecurity prowess, proposing it as an ongoing benchmark. They also released the dataset, contracts, and scaffolding code openly, inviting community contributions to expand the vulnerability library. This transparency aligns with Anthropic’s mission to responsibly scale AI, urging the field to anticipate dual-use potentials in blockchain security.

In practical terms, the study quantifies how AI is eroding the “expertise moat” in smart contract hacking. Where once exploits demanded deep Solidity fluency and blockchain internals, frontier models now bridge that gap through pattern-matching on vast training data encompassing audits, hacks, and fixes. Cumulative simulated losses—peaking at $3.2 million in extended runs—illustrate the scale: equivalent to wiping out mid-tier DeFi protocols.

As blockchain ecosystems grow to trillions in value, these findings spotlight the urgency of AI-red-teaming in crypto. Developers, auditors, and protocols must integrate AI-assisted fuzzing and anomaly detection, while regulators consider AI’s role in emerging financial threats.

Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.

What are your thoughts on this? I’d love to hear about your own experiences in the comments below.