Google fixes several bugs in Gemini usage limits that burned through quotas too fast

Google has fixed multiple bugs in its Gemini API usage limits that caused developer quotas to deplete faster than intended.

The issues affected rate limits and token counting, leading to premature throttling and unexpected billing for some users.

The fixes are now live across Gemini’s API tiers, including the free tier and paid plans.

What Went Wrong with Gemini Quotas

Developers reported that their usage quotas were burning through at an unusually high rate.

Some saw their requests rejected even though they had not reached the advertised limits.

Google identified several distinct bugs that contributed to the problem.

  • Rate limit overcounting: The API sometimes counted a single request as multiple requests against the user’s rate limit.
  • Token double-billing: Input and output token counts were inflated, causing faster consumption of monthly token quotas.
  • Caching errors: Certain cached responses were not properly deducted, leading to inconsistent quota tracking.
  • Time-window miscalculation: The sliding window used for rate limiting occasionally reset incorrectly, artificially shortening the available quota.

“These bugs meant that some developers were hitting their usage caps 30–50% faster than expected,” Google said in a developer update.

How Google Fixed the Issues

The company deployed server-side patches to correct the counting logic.

No action is required from developers for the fixes to take effect.

Google also added more granular logging so users can now see real-time quota consumption per endpoint.

  • Gemini 1.0 Pro and Gemini 1.5 Flash models saw the most significant improvements after the fixes.
  • Paid tier users should see more predictable billing and fewer unexpected throttling events.
  • Free tier users will now have their limits enforced more accurately, reducing false rejections.

What This Means for Developers

The bugs primarily affected high-frequency API calls and batch processing workflows.

Developers who relied on near-limit usage may have experienced inconsistent performance.

Google recommends that all API users review their quota dashboards to confirm the corrected metrics.

Key takeaway: If your Gemini API usage seemed too aggressive over the past weeks, the underlying counting errors have now been resolved.

Background on Gemini API Limits

Gemini offers different quota tiers based on requests per minute (RPM), tokens per minute (TPM), and total monthly tokens.

The free tier allows up to 60 requests per minute, while paid plans scale to thousands of RPM.

Google introduced Gemini’s API in late 2023 and has iterated on pricing and limits since then.

The company acknowledged that the bugs were “regrettable” and promised better testing for future updates.

No further issues have been reported since the patches were applied.

Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.

What are your thoughts on this? I’d love to hear about your own experiences in the comments below.