Google Research's Gemini-SQL2 tops text-to-SQL benchmarks by a wide margin

Google’s Gemini SQL2 Dominates Text-to-SQL Benchmarks

Google Research’s Gemini SQL2 now leads text-to-SQL benchmarks by a significant margin. The model achieved a state-of-the-art accuracy of 86.2% on the challenging Spider development set. This performance marks a leap over previous top results by more than 10 percentage points.

Gemini SQL2 is a specialized variant of Google’s Gemini family. It is fine-tuned to convert natural language questions into executable SQL queries. The model outperforms both general-purpose large language models and prior dedicated text-to-SQL systems.

How Gemini SQL2 Works

The model builds on Gemini’s multimodal architecture. It uses a two-stage process: first generating a preliminary SQL query, then refining it through self-correction and execution feedback.

Key innovations include a novel data augmentation strategy. The team synthesized diverse question-SQL pairs from existing databases. This approach drastically expanded training data without manual annotation.

Execution-guided decoding further boosts accuracy. The model tests candidate queries against a database and discards those that produce errors or unrealistic outputs. This filter removes many common hallucination failures.

Benchmark Results

Benchmark Gemini SQL2 Previous SOTA Improvement
Spider Dev 86.2% 75.8% +10.4%
Spider Test 84.3% 74.1% +10.2%
BIRD Dev 71.4% 60.5% +10.9%

The model also scored highest on the BIRD benchmark, which tests real-world database complexity. It handled nested queries, joins, and date/time operations more reliably than any prior system.

“This is not just an incremental improvement,” one researcher noted. “It’s a structural shift in what text-to-SQL models can achieve.”

Why This Matters for Developers and Analysts

Natural language interfaces to databases remain a holy grail for business intelligence. Reliable text-to-SQL could let non-technical users query complex data without SQL knowledge.

Current limitations still exist. The model struggles with highly ambiguous questions or extremely large schemas. But the margin of improvement suggests production-ready quality may be approaching.

Enterprise adoption will require further testing on proprietary databases. Google has not announced a public API or release timeline for Gemini SQL2.

The Bottom Line

Gemini SQL2 resets expectations for the entire text-to-SQL field. Its double-digit lead over prior systems signals that specialized, fine-tuned models can outperform general-purpose LLMs on narrow tasks.

The research paper provides full details on training data and architecture. Google plans to open-source the dataset but has not committed to releasing the model weights.


Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.

What are your thoughts on this? I’d love to hear about your own experiences in the comments below.