AI models often give the right answers but point to the wrong sources

AI Models Often Give Right Answers but Cite Wrong Sources

New research reveals a critical flaw in large language models: they frequently provide correct answers while pointing to inaccurate or nonexistent sources, undermining trust in AI-generated information.

A study published in the Journal of the Association for Information Science and Technology tested four major AI models, including GPT-4 and Google’s PaLM. Researchers asked the models to answer questions and provide citations. The results show a stark disconnect between accuracy and verifiability.

The Core Finding: Right Answer, Wrong Source

AI models achieved high accuracy rates on factual questions. However, they often fabricated or misattributed the sources they cited.

The study found that GPT-4 provided correct answers 76% of the time. But its citations were incorrect or invented 43% of the time. Google’s PaLM performed similarly, with 70% answer accuracy but incorrect citations 38% of the time.

This pattern creates a dangerous illusion of credibility. Users see a correct answer with a citation and assume the source is real.

Why Hallucinated Citations Are Dangerous

“A correct answer with a fake citation is more misleading than a wrong answer alone. It gives users false confidence in the information.” — Study co-author Dr. Meredith Thompson

The problem is not unique to one model. It appears across all tested systems. The researchers noted that smaller models hallucinated sources more frequently than larger ones.

Key risks include:

  • Misattributed expertise: Users may cite fake academic papers or articles in their own work.
  • Erosion of trust: Repeated exposure to fake citations damages user confidence in AI tools.
  • Spreading misinformation: A plausible-looking but fake source can amplify false claims.

How Models Create Fake Citations

The study identified three types of citation errors:

Nonexistent sources. The AI generates a reference that looks real but does not exist. Author names, journal titles, and publication years are invented.

Mismatched claims. The citation points to a real source, but the source does not support the AI’s claim. The model misinterprets or misremembers the content.

Partial fabrication. Some elements of the citation are correct, such as the journal name, but other parts, like the volume number or page range, are wrong.

Implications for Researchers and Journalists

Professionals who rely on AI for research face significant risk. Using a fake citation in a published article can lead to retractions, reputational damage, or legal liability.

The researchers recommend:

  • Always verify citations against the original source before using them.
  • Treat AI as a drafting tool, not a final authority on references.
  • Request full source text when possible, rather than accepting citations at face value.

Current Limitations and Future Solutions

The study tested models available in early 2024. Newer versions may perform differently, but the underlying architecture remains the same. Language models are designed to predict text, not to fact-check sources.

Potential solutions include:

  • Retrieval-augmented generation (RAG): Connecting models to verified databases reduces hallucinated citations.
  • Source scoring: Models that mark citations by confidence level could help users assess reliability.
  • User-side verification tools: Browser extensions or plugins that automatically check cited sources against known databases.

Bottom Line for AI Users

Do not assume that a correct answer means the source is real. Always verify citations independently.

The study serves as a warning: AI can be both impressively accurate and dangerously deceptive at the same time. Users who skip source verification risk amplifying errors and undermining their own credibility.

Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.

What are your thoughts on this? I’d love to hear about your own experiences in the comments below.