Hallucinated References Slip Through Peer Review at Leading AI Conferences, Prompting Launch of Open-Source Detection Tool
In the rapidly evolving field of artificial intelligence, the integrity of academic publishing faces a novel challenge: fabricated citations generated by large language models (LLMs) are evading detection and appearing in papers accepted at premier conferences. Recent analyses reveal that these hallucinated references, which point to nonexistent sources, have passed rigorous peer review processes at events like NeurIPS and ICLR. To combat this issue, researchers have released CheckCite, a free, open-source browser extension designed to automatically verify citations in real time.
The problem surfaced prominently in 2024 when multiple high-profile cases emerged. A paper titled “TreeNeRF: Tree Propagation for Neural Radiance Field” by Johan Almgren and colleagues, accepted to the NeurIPS 2024 Datasets and Benchmarks track, included seven fabricated references. These citations referenced papers and authors that do not exist, such as a supposed 2023 work by “Li et al.” on neural radiance fields for foliage rendering. Similarly, at ICLR 2024, a submission incorporated invented sources, including one attributed to “Zhang et al.” from a nonexistent NeurIPS proceedings volume. Reviewers, tasked with evaluating technical merit, novelty, and related work, overlooked these anomalies despite their centrality to claims about prior art.
Hallucinations occur because LLMs, when tasked with literature reviews or bibliography generation, confidently invent plausible-sounding references to fill knowledge gaps. Trained on vast corpora of real papers, models like GPT-4 or Claude extrapolate details, fabricating DOIs, publication venues, and author lists that mimic authentic entries. In AI research, where surveys of hundreds of papers are common, authors increasingly rely on LLMs for efficiency, amplifying the risk. A study by Almgren’s team scanned over 15,000 papers from five top AI conferences (NeurIPS, ICLR, ICML, CVPR, ACL) spanning 2022 to 2024, identifying 50 cases of hallucinated citations across 32 accepted papers. Alarmingly, 20 percent of these involved long-form fabrications spanning multiple sentences, not mere typos.
Peer review exacerbates the vulnerability. Reviewers typically spend limited time on bibliographies, focusing instead on methodology and results. They rarely cross-check citations against databases, assuming authors have verified them. Conference submission systems like OpenReview display references as plain text, lacking hyperlinks or validation. This oversight undermines the scholarly record: hallucinated citations distort attribution, mislead future researchers, and erode trust in AI literature, where rapid iteration demands accurate historical context.
Enter CheckCite, developed by Almgren and collaborators as an accessible solution. Available on GitHub under the MIT license, the tool functions as a Chrome extension that integrates seamlessly with arXiv, OpenReview, Google Scholar, and conference proceedings pages. Upon activation, it scans all references on a page, querying public APIs from Semantic Scholar, Crossref, and OpenAlex to confirm existence, metadata accuracy, and DOI resolution. Invalid or suspicious entries trigger color-coded highlights: red for fully fabricated citations, orange for mismatches (e.g., wrong authors or years), and green for verified ones. Users receive inline tooltips with evidence, such as “No paper found matching DOI: 10.xxxx/abcde” or “Title mismatch: expected ‘Neural Trees’ but found unrelated content.”
CheckCite’s architecture prioritizes speed and privacy. It processes batches asynchronously using client-side JavaScript, avoiding server uploads of paper content. Verification relies on fuzzy matching algorithms tolerant to minor typos, combined with semantic similarity checks via lightweight embeddings. For efficiency, it caches results locally and supports bulk mode for entire papers. Early tests on the scanned corpus detected all 50 known hallucinations with zero false positives on legitimate references, achieving 99.8 percent accuracy overall. The tool flags not just nonexistence but subtler issues, like predatory journal citations or preprints misrepresented as peer-reviewed publications.
Deployment is straightforward: install from the Chrome Web Store, enable on target sites, and review overlays appear instantly. The developers encourage contributions via GitHub issues and pull requests, with plans for Firefox support and integration into submission platforms. By democratizing citation verification, CheckCite shifts responsibility upstream, empowering reviewers, authors, and readers to uphold publication standards.
This episode highlights broader implications for AI-assisted research. As LLMs permeate workflows, safeguards must evolve beyond disclaimers. Conferences could mandate tools like CheckCite in review pipelines, while authors adopt transparent disclosure of AI use in bibliographies. Until systemic changes take hold, open tools offer a practical bulwark against the creep of unreliability in AI scholarship.
Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.
What are your thoughts on this? I’d love to hear about your own experiences in the comments below.