Google's Gemini 3.1 Pro Preview tops Artificial Analysis Intelligence Index at less than half the cost of its rivals

Google’s Gemini 3.1 Pro Preview Leads Artificial Analysis Intelligence Index with Unmatched Cost Efficiency

In a significant development for the AI landscape, Google’s Gemini 3.1 Pro Preview has claimed the top spot on the Artificial Analysis Intelligence Index. This achievement stands out not only for its superior performance across a battery of rigorous benchmarks but also for delivering this excellence at less than half the cost of its closest competitors. The index, maintained by Artificial Analysis, serves as a comprehensive leaderboard evaluating frontier AI models on key capabilities such as reasoning, coding, mathematics, and multimodal understanding.

The Artificial Analysis Intelligence Index aggregates results from multiple established benchmarks to provide a holistic view of model intelligence. It includes tests like GPQA Diamond for high-level reasoning, AIME 2024 for mathematical problem-solving, LiveCodeBench for coding proficiency, and MMMU for multimodal tasks involving vision and language. Gemini 3.1 Pro Preview achieved an index score of 68, surpassing Anthropic’s Claude 3.5 Sonnet Opus at 67 and OpenAI’s o1-preview at 66. This positions Google at the forefront, with the preview version demonstrating capabilities that rival or exceed fully released models from rivals.

Cost-effectiveness emerges as a defining factor in Gemini 3.1 Pro Preview’s dominance. Priced at $1.25 per million input tokens and $10 per million output tokens, it undercuts competitors dramatically. For context, Claude 3.5 Sonnet costs $3 per million input tokens and $15 per million output tokens, while o1-preview is priced at $15 per million input and $60 per million output. This translates to Gemini offering top-tier performance at approximately 42 percent of Claude’s blended cost and a mere 17 percent of o1-preview’s. Such pricing makes advanced AI accessible for broader enterprise and developer adoption, potentially accelerating innovation in cost-sensitive applications.

Breaking down the benchmark results reveals Gemini 3.1 Pro Preview’s strengths. On GPQA Diamond, a challenging graduate-level science benchmark, it scored 84 percent, edging out Claude 3.5 Sonnet’s 83 percent and significantly outperforming o1-preview’s 78 percent. In mathematics, AIME 2024 saw Gemini achieve 92 percent accuracy, matching the highest scores and highlighting its prowess in complex problem-solving. Coding benchmarks like LiveCodeBench yielded a 70 percent pass rate for Gemini, competitive with leaders and indicative of robust software engineering assistance.

Multimodal capabilities further bolster its index lead. MMMU, which tests integrated vision-language reasoning, resulted in a 74 percent score for Gemini 3.1 Pro Preview, ahead of Claude’s 73 percent. Additional evals such as TAU-bench for tool use (78 percent) and ChartQA for visual data interpretation (89 percent) underscore its versatility. These results stem from Google’s ongoing advancements in the Gemini family, building on previous iterations like Gemini 1.5 Pro and 2.0 Flash.

The preview’s release via Google’s Vertex AI and Gemini API platforms enables immediate testing and integration. Developers can access it through standard APIs, with input context windows supporting up to 2 million tokens, facilitating long-context tasks. Output speed metrics are impressive, with 220 tokens per second reported, balancing quality and latency effectively. This combination of high intelligence, low latency, and economical pricing challenges the notion that cutting-edge AI must come at a premium.

Competitive context is crucial. While Claude 3.5 Sonnet holds strong in creative writing and ethical alignment, its higher costs limit scalability. OpenAI’s o1 models excel in deliberate reasoning chains but incur steep expenses due to extended inference times. Gemini 3.1 Pro Preview bridges these gaps, offering a balanced profile suitable for production workloads in sectors like software development, scientific research, and content generation.

Artificial Analysis emphasizes that index scores reflect normalized, repeatable evaluations under controlled conditions, excluding subjective or safety-focused metrics. Pricing data is sourced from provider APIs as of the latest update, with blended costs calculated at a 3:1 input-to-output ratio. Future updates to the index will incorporate emerging benchmarks, potentially shifting rankings as models evolve.

Google’s strategic positioning with Gemini 3.1 Pro Preview signals intensified competition in the frontier AI space. By prioritizing both performance and affordability, it democratizes access to state-of-the-art intelligence, empowering smaller teams and cost-conscious enterprises. As previews transition to stable releases, anticipation builds for full production deployment and further enhancements.

Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.

What are your thoughts on this? I’d love to hear about your own experiences in the comments below.