Anthropic accuses Deepseek, Moonshot, and MiniMax of stealing Claude's AI data through 16 million queries

Anthropic Accuses DeepSeek, Moonshot, and MiniMax of Stealing Claude AI Data via 16 Million Queries

Anthropic, the developer of the Claude family of large language models, has publicly accused three Chinese AI companies, DeepSeek, Moonshot AI, and MiniMax, of systematically scraping its proprietary data. The allegations center on the companies sending approximately 16 million unauthorized queries to Claude’s API within a 30-day period. This activity, Anthropic claims, constitutes a deliberate effort to “distill” Claude’s capabilities into their own competing models, violating terms of service and potentially infringing on intellectual property.

The accusations were detailed in blog posts published by Anthropic on October 22, 2024, accompanied by supporting evidence and legal actions. Anthropic’s investigation revealed patterns consistent with model distillation, a technique where one AI model is trained to replicate the outputs of a more powerful “teacher” model by querying it extensively. In this case, the teacher model was Claude 3.5 Sonnet, one of Anthropic’s flagship offerings.

Detection of Suspicious Activity

Anthropic’s monitoring systems flagged the anomalous behavior starting in May 2024. Key indicators included:

  • High-volume querying from single sources: Individual IP addresses, often associated with cloud providers in China, sent hundreds of thousands of requests per day. For instance, one actor dispatched over 800,000 queries in a single day to Claude 3.5 Sonnet.

  • Repetitive and patterned requests: Queries followed unnatural patterns, such as repeating similar prompts across multiple models or generating long chains of synthetic data. This is atypical of legitimate user behavior, which tends to be more varied and sporadic.

  • Lack of diversity in usage: The actors showed little interest in Claude’s safety features or conversational capabilities, focusing instead on extracting raw model outputs for downstream training.

Anthropic estimates the total volume at around 16 million queries, primarily targeting Claude 3.5 Sonnet but also affecting earlier models like Claude 3 Opus and Haiku. The company implemented rate limits and other defenses, which reduced but did not eliminate the activity.

Evidence Linking Queries to Competitor Models

Anthropic provided concrete evidence tying the scraping to specific models released by the accused companies:

  • DeepSeek-V2: Launched in May 2024, shortly after the detected querying began. Benchmarks show DeepSeek-V2 performing comparably to Claude 3.5 Sonnet on several tasks, despite DeepSeek’s smaller scale. Anthropic identified synthetic data in DeepSeek-V2’s training that closely mirrored Claude’s response style and phrasing.

  • Moonshot AI’s Kimi: Updated versions exhibited sudden performance jumps aligning with scraping timelines. Query logs matched outputs later seen in Kimi’s behavior.

  • MiniMax’s abab6.5s: Released in June 2024, this model demonstrated knowledge of obscure details from Claude’s training data, suggesting distillation.

To substantiate these claims, Anthropic conducted controlled experiments. They prompted the competitor models with proprietary evaluation datasets designed to probe for memorized or distilled Claude data. Results showed statistically significant overlaps in responses, far exceeding what random chance would produce. For example, on niche factual questions, the models replicated Claude’s answers with near-identical wording.

Legal and Technical Responses

Anthropic responded aggressively on multiple fronts:

  • Terms of Service Enforcement: The company terminated API keys linked to the suspicious IPs and blocked offending addresses.

  • DMCA Notices: Over 100 Digital Millennium Copyright Act takedown requests were sent to domain registrars and cloud hosts, including Alibaba Cloud and Tencent Cloud, requesting removal of offending models and websites.

  • Public Disclosure: Detailed blog posts outline the methodology, including visualizations of query volumes over time and IP geolocations. Anthropic shared anonymized datasets of suspicious queries for transparency.

Despite these measures, some models remain accessible. DeepSeek denied wrongdoing, claiming its data came from public sources. Moonshot and MiniMax have not publicly responded.

Broader Implications for AI Industry

This incident highlights escalating tensions in the AI race, particularly between U.S.-based firms like Anthropic and rapidly advancing Chinese competitors. Distillation via API scraping undermines the substantial investments required to train frontier models like Claude, which demand billions in compute and human oversight.

Anthropic emphasized that while public data is fair game, proprietary APIs are not. The company advocates for industry-wide norms, such as watermarking model outputs and collaborative rate-limiting. It also called on cloud providers to scrutinize high-volume AI traffic.

Technically, the case underscores the challenges of securing APIs against sophisticated actors. Traditional defenses like CAPTCHAs falter against automated systems using proxies and behavioral mimicry. Anthropic’s approach relied on behavioral analytics, output watermarking, and partnerships with infrastructure providers.

As AI models grow more capable, such data acquisition tactics could proliferate, raising questions about sustainable innovation paths. Anthropic’s actions set a precedent for defending intellectual property in an era where model capabilities are the primary competitive moat.

Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.

What are your thoughts on this? I’d love to hear about your own experiences in the comments below.