Over 100 Fabricated Citations Bypass Peer Review at Prestigious AI Conference
In a striking revelation that exposes vulnerabilities in the academic peer review process, researchers have demonstrated how more than 100 entirely fake citations managed to evade detection and secure acceptance for a paper at the International Conference on Learning Representations (ICLR) 2024, one of the premier venues in artificial intelligence research.
The experiment, detailed in a follow-up disclosure, was conducted by a team led by researchers including Igor Melnyk from the University of Minnesota and colleagues. They crafted a submission titled “Visual Planning with Large Language Models in the Wild,” a seemingly legitimate paper on vision-language models for robotic manipulation tasks. To test the robustness of peer review, the authors deliberately incorporated 110 hallucinated references - fictitious citations generated entirely by OpenAI’s GPT-3.5 Turbo model. These bogus entries mimicked real academic papers, complete with plausible titles, authors, venues, and years, such as “Enhancing Visual Servoing with Deep Reinforcement Learning” purportedly from NeurIPS 2022 or “Zero-Shot Visual Manipulation via CLIP” from ICRA 2021.
The process began with prompting GPT-3.5 to produce realistic-looking references relevant to the paper’s topic. The model outputted entries that appeared convincing at first glance, blending invented works with occasional nods to genuine research. The researchers then integrated these into the bibliography, ensuring the in-text citations aligned seamlessly with the fabricated list. Submitted under a double-blind review process typical for ICLR, the paper underwent evaluation by domain experts. Reviewers praised aspects of the work, including its novelty and empirical results, and ultimately recommended acceptance for the poster track - a competitive category requiring scores above the acceptance threshold.
ICLR 2024, held virtually and in person in Vienna, Austria, from May 7-11, accepted around 2,700 papers out of over 8,000 submissions, yielding an acceptance rate of approximately 34 percent. The conference’s peer review relies on automated systems like OpenReview for managing submissions, reviewer assignments, and feedback. Each paper receives multiple reviews from qualified experts, followed by author rebuttals and area chair oversight. Despite this multi-layered scrutiny, none of the six reviewers or area chairs flagged the anomalous citations. One reviewer even commended the “extensive related work” section, unaware of its artificial nature.
Upon acceptance notification in late January 2024, the researchers promptly informed ICLR organizers via email, withdrawing the paper and revealing the ruse. In their disclosure blog post, they shared the full prompt used for citation generation: a detailed instruction set directing GPT-3.5 to create 110 diverse, topical references spanning 2018-2023 from top conferences like NeurIPS, ICML, CVPR, and ICLR. The output included DOIs, page numbers, and author lists that superficially resembled authentic formats, though deeper inspection would reveal non-existent sources.
ICLR general chairs Joelle Pineau and Yann LeCun, along with program chairs, acknowledged the incident publicly. In a statement on OpenReview, they noted: “We take this matter seriously and are investigating the circumstances surrounding this submission. Peer review is not infallible, and incidents like this highlight areas for improvement.” They emphasized that the conference employs tools like Toronto Paper Matching System for reviewer assignment and iThenticate for plagiarism checks, but acknowledged limitations in detecting fabricated bibliographies. No automatic verification of citation validity occurs during review, as reviewers are expected to assess relevance and novelty manually.
This episode underscores broader challenges in AI conference peer review amid explosive growth in submissions. ICLR 2024 saw a 30 percent increase over the prior year, straining reviewer pools. Reviewers, often overburdened academics, may skim bibliographies rather than verify each entry exhaustively. The prevalence of large language models exacerbates risks, as they excel at generating plausible text, including references that pass casual inspection.
The researchers’ stunt echoes prior critiques, such as the 2018 “Grumpy” fake paper accepted to multiple venues or the 2022 NeurIPS experiment with gibberish-laden submissions. However, this case stands out for its scale - 110 fakes versus isolated instances - and success in a top-tier blind review. Melnyk’s team argued in their post-mortem that while the paper’s technical contributions were real (derived from prior work), the undetected fakes question trust in the process. They proposed enhancements like citation validation tools, random bibliography audits, or integrating services akin to Google Scholar APIs during review.
Conference organizers have since outlined interim measures: enhanced reviewer guidelines stressing bibliography scrutiny, potential integration of reference-checking bots, and post-acceptance audits for high-risk papers. Pineau highlighted in correspondence that ICLR continually evolves its Toronto Paper Matching and review platforms to counter emerging threats from generative AI.
The incident reverberates through the AI community, prompting debates on peer review’s scalability. As AI papers proliferate - fueled by industry labs like OpenAI, Google DeepMind, and Meta AI submitting en masse - maintaining quality demands innovation. Tools like SciScore or automated cross-referencing could mitigate such lapses, but implementation lags behind threats.
For researchers, the takeaway is caution: even rigorous processes harbor blind spots. Fabricated citations not only erode credibility but could propagate errors if undetected post-publication. This ICLR case serves as a wake-up call, urging the field to fortify peer review against AI-augmented deception while preserving its role as a cornerstone of scientific validation.
Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.
What are your thoughts on this? I’d love to hear about your own experiences in the comments below.