Sakana AI's Fugu orchestrates multiple LLMs to match Anthropic's Fable and Mythos benchmarks

amu · June 22, 2026, 8:24am

Sakana AI’s “FUGU” Method Matches Top AI Models by Merging Multiple LLMs

A new technique called “FUGU” from Japanese AI lab Sakana AI allows multiple smaller language models to work together, achieving performance equal to or better than leading single models like Anthropic’s “Fable” and “Mythos.”

The system uses a “fusion of experts” approach. Instead of training one massive model, FUGU orchestrates, merges, and coordinates the outputs of several specialized models. The result is a single, unified AI that outperforms its individual parts.

Key Takeaway: FUGU Proves “More is More” for AI

FUGU demonstrates that combining several smaller, specialized AI models can beat the performance of a single, monolithic, expensive model. This is a shift from the industry’s focus on “bigger is better” and opens a path toward more modular, cost-effective AI systems.

How FUGU Works: A “Fusion of Experts”

FUGU is not a single model. It is a method for combining multiple models.

Orchestration Layer: A central “router” model analyzes the incoming prompt, such as a question or coding problem.
Expert Assignment: The router instantly decides which “expert” model, or combination of models, is best suited to answer the query.
Output Merging: The system merges the outputs from the selected experts into a single, coherent, and optimal response.

The system was tested against Anthropic’s proprietary “Fable” and “Mythos” benchmarks. These are not standard public benchmarks but are considered highly challenging.

Key Results: Matching and Surpassing Top-Tier Models

Sakana AI’s FUGU method achieved a 100% “match rate” on the “Fable” benchmark. This means FUGU produced answers that were equally as good as those from the top-performing single model.

When tested against the “Mythos” benchmark, FUGU actually outperformed Anthropic’s own single-model implementations. The combined expert system produced better results than any individual model could on its own.

Why This Matters for the Future of AI

This approach challenges the current paradigm of building ever-larger AI models.

Cost Reduction: Training a massive model is astronomical. FUGU uses existing, smaller models.
Modularity: You can swap out “experts” for specific tasks (e.g., law, medicine, coding) without retraining the entire system.
Performance Ceiling: The research suggests there is a limit to what a single model can do. FUGU breaks that ceiling by combining specialized intelligence.

“This is not just about making a faster AI, but a smarter one. By using many specialized minds, FUGU solves problems no single mind can tackle,” the research suggests.

The Practical Impact: Domain-Specific Superperformance

FUGU’s strength is its ability to handle specialized domains. For example:

For Coding: A model expert in Python can be merged with an expert in security.
For Law: An expert in contract law can be fused with an expert in case precedent.
For Creativity: A “creative writer” expert can be blended with a “fact-checker” expert.

This allows a single FUGU-powered system to act as a “generalist” while still possessing “expert-level” knowledge in specific areas.

Limitations and Next Steps

The FUGU method is currently a research project. The primary limitation is the orchestration layer itself, which must be trained to accurately “route” the prompt to the correct expert. Misrouting could lead to a poor output.

Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.

What are your thoughts on this? I’d love to hear about your own experiences in the comments below.