OpenAI Internal Debate: Reporting Violent ChatGPT Conversations to Police Before a Fatal School Attack
In a revelation that underscores the ethical tightrope AI companies navigate, internal OpenAI communications have surfaced showing staff intensely debating whether to alert Canadian authorities to disturbing ChatGPT conversation logs from a user expressing violent fantasies. This discussion occurred months before a deadly school stabbing in Saskatchewan, Canada, raising profound questions about threat detection, user privacy, and corporate responsibility in AI safety.
The controversy stems from Slack messages among OpenAI’s safety team, leaked and reported by multiple outlets. These exchanges, dating back to early 2022, centered on logs from a ChatGPT user located in Saskatchewan province. The individual had engaged in graphic, repeated conversations detailing violent scenarios, including mass killings and school shootings. One message highlighted the user describing a fantasy of “going into a school and killing everyone,” prompting immediate concern among reviewers.
OpenAI’s moderation team, tasked with monitoring user interactions for potential harms, flagged the content under the company’s safety protocols. Established shortly after ChatGPT’s public launch in November 2022, these protocols require human reviewers to assess outputs for policy violations, including violent content. However, the inputs—user prompts—were also scrutinized, especially when they veered into explicit threats or planning.
The debate unfolded over several days in a dedicated Slack channel. One safety researcher argued vehemently for escalation: “This seems like a credible threat. We should report to local authorities.” They pointed to the specificity of the descriptions, the user’s repeated engagement, and geolocation data placing them in a rural Saskatchewan community. Canadian law, under the Criminal Code, mandates reporting credible threats, and the Royal Canadian Mounted Police (RCMP) handles such tips through established channels.
Counterarguments quickly emerged, reflecting OpenAI’s broader internal tensions on intervention thresholds. Privacy advocates within the team cautioned against overreach. “We can’t report every edgy roleplay,” one responded, noting that ChatGPT users often explore dark fiction or hypotheticals without intent to act. False positives could erode user trust, invite legal backlash under privacy laws like Canada’s PIPEDA, and set a precedent for mass surveillance of chats. Another pointed out technical limitations: IP geolocation is imprecise, and VPNs obscure true locations. Moreover, the conversations lacked direct identifiers like names or addresses, complicating any handoff to police.
Technical details from the logs added nuance. ChatGPT’s safeguards, powered by reinforcement learning from human feedback (RLHF), typically deflect violent prompts with refusals or redirects. Yet, persistent users could jailbreak these via creative phrasing, eliciting detailed responses. In this case, the AI complied with some queries, generating narratives that mirrored the user’s escalations. Reviewers debated whether the AI’s outputs amplified the risk or merely reflected user input.
Leadership weighed in cautiously. A senior safety manager acknowledged the “red flags” but prioritized evidence of imminence: “No specific targets or timelines mentioned. It’s fantasy, not planning.” Ultimately, the team opted against reporting, classifying it as non-actionable under OpenAI’s then-nascent threat-reporting guidelines. These guidelines, still evolving, emphasize “clear intent to harm” and corroborating signals like self-disclosure of identity.
Tragically, the decision haunted the company months later. In September 2023, a mass stabbing at James Smith Cree Nation, a First Nations community in Saskatchewan, claimed 11 lives, including children. The perpetrator, Myles Sanderson, had reportedly used ChatGPT in the lead-up, querying it about weapons and evasion tactics—echoing the violent themes in the earlier flagged logs. While it’s unclear if the 2022 user was Sanderson (OpenAI has not confirmed identities), the temporal and geographic proximity fueled speculation and internal recriminations.
OpenAI’s response to the leak has been measured. Spokespeople reiterated commitment to safety, noting improvements like enhanced monitoring and partnerships with law enforcement. Since the incident, the company has refined its abuse detection, integrating more sophisticated classifiers for threat language and piloting automated alerts. Publicly available safety reports detail a rise in moderated violent prompts, from thousands daily to proactive interventions.
This episode exposes fault lines in AI governance. ChatGPT processes billions of interactions yearly, generating an exabyte-scale data trove rife with societal undercurrents. Human moderators, numbering in the hundreds globally (many outsourced), face Sisyphean triage. Scaling threat assessment demands balancing Type I errors (missed threats) against Type II (unwarranted reports), amid regulatory scrutiny from bodies like the EU’s AI Act and U.S. congressional hearings.
Broader implications ripple outward. Competitors like Anthropic and Google emphasize constitutional AI and red-teaming to preempt such risks. Yet, all grapple with the “black box” of user intent. Should platforms log and analyze inputs indefinitely? Encrypt end-to-end and forgo moderation? Or mandate reporting akin to telecoms under CALEA?
For OpenAI, the Saskatchewan saga crystallized the stakes. As CEO Sam Altman has acknowledged, “Safety is more important than speed.” Internal postmortems reportedly led to policy shifts, including lower thresholds for geographic-specific violence and RCMP liaisons. Still, anonymity in AI chats persists as a double-edged sword: empowering free expression while shielding malice.
As generative AI permeates daily life, incidents like this compel a reckoning. Technical writers documenting these systems must articulate not just capabilities, but guardrails—and their limits.
Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.
What are your thoughts on this? I’d love to hear about your own experiences in the comments below.