Google and OpenAI complain about distillation attacks that clone their AI models on the cheap

Google and OpenAI Raise Alarms Over Distillation Attacks Cloning AI Models at Low Cost

Major AI developers Google and OpenAI have publicly highlighted a growing threat to their proprietary large language models: distillation attacks. These techniques enable attackers to replicate the capabilities of high-performance, closed-source models using significantly cheaper, open-source alternatives. By systematically querying APIs of models like Gemini and GPT-4, adversaries can harvest vast amounts of training data derived from model outputs, then fine-tune smaller models to mimic the originals closely. This process, known as knowledge distillation, undermines the substantial investments these companies have made in model development.

Distillation fundamentally involves transferring knowledge from a larger, more complex “teacher” model to a smaller “student” model. In a standard setup, the teacher provides labeled predictions on data, which the student learns to replicate. In the context of attacks, however, the process exploits public APIs without permission. Attackers submit millions of queries, often crafted prompts designed to elicit detailed responses across diverse domains. These responses form a synthetic dataset used to train open models such as Meta’s Llama series. The result is a distilled model that achieves comparable performance on benchmarks while costing a fraction to run and deploy.

Google DeepMind researchers detailed this issue in a recent paper titled “Distillation Attacks: Stealing Parameters from Strong LLMs under Minimal Circumstances.” They demonstrated how an attacker could clone Gemini 1.5 Pro using just 0.001% of its parameter count in a target model. The attack leveraged 13 million API calls, generating a dataset that allowed a fine-tuned Llama-3.1 8B model to match or exceed Gemini’s performance on tasks like math reasoning and coding. OpenAI echoed these concerns in their own research, “The Model Collapse of Distillation,” warning that widespread distillation could lead to “model collapse,” where iteratively distilled models degrade in quality due to amplified biases and reduced diversity in training data.

Detection poses significant challenges for API providers. Normal user queries follow predictable patterns, such as conversational threads or task-specific prompts. Distillation attackers, by contrast, exhibit anomalous behaviors: high-volume, repetitive queries with systematic variations, often targeting edge cases or instruction-following capabilities. Google reported identifying such patterns in real-world traffic, including accounts making over 100,000 requests per day with low diversity in prompt styles. OpenAI has similarly banned accounts suspected of distillation after observing spikes in query rates exceeding typical user limits.

To counter these threats, both companies advocate for robust API safeguards. Google proposes query budgeting, where users receive limited inference credits redeemable only for approved models, preventing bulk data harvesting. They also recommend watermarking outputs—embedding subtle, detectable signals in generated text—to trace distilled models back to their sources. OpenAI emphasizes rate limiting, behavioral monitoring via machine learning classifiers trained on attack signatures, and legal enforcement against violators. Despite these measures, attackers adapt quickly; for instance, distributing queries across multiple accounts or using proxies to evade detection.

A notable real-world example is the Chinese startup DeepSeek, which openly acknowledged distilling models from GPT-4 and Claude using billions of API tokens. Their resulting DeepSeek-V2 model rivals proprietary counterparts at a much lower inference cost, sparking debates on fair use versus intellectual property theft. DeepSeek defended the practice as legitimate reverse-engineering, arguing that API access implies consent for such usage. However, Google and OpenAI view it as freeloading on their R&D expenses, which run into billions for data curation, compute, and safety alignment.

The economic implications are stark. Running a query on GPT-4o costs around $5 per million tokens, while a distilled Llama model operates at near-zero marginal cost on consumer hardware. This democratizes AI but erodes the moat protecting premium services. Providers risk commoditization, where users migrate to free clones, slashing revenue from subscriptions and enterprise deals. Long-term, it could stifle innovation if companies withhold API access entirely, limiting ecosystem growth.

Mitigation strategies extend beyond detection. Researchers suggest adversarial training, where models are hardened against distillation by exposing them to attack-like queries during fine-tuning. Output filtering to obscure fine-grained details or adding noise to predictions could further complicate dataset creation. Yet, perfect defense remains elusive; as models scale, the signal-to-noise ratio in outputs makes distillation increasingly feasible even with noisy data.

This escalating cat-and-mouse game underscores a pivotal tension in AI development: balancing openness for collaboration against protecting competitive edges. As distillation techniques proliferate—evidenced by open-source tools on GitHub automating the process—Google and OpenAI urge industry-wide standards, including API usage policies that explicitly prohibit distillation. Without collective action, the era of cheap clones may accelerate, reshaping the AI landscape profoundly.

Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.

What are your thoughts on this? I’d love to hear about your own experiences in the comments below.