Data2Story turns a CSV file into a verified interactive news article using seven AI agents

Data2Story converts CSV files into interactive news articles using seven specialized AI agents, with built-in verification to ensure accuracy. The system automatically analyzes raw data, generates a narrative, and creates visualizations, all while cross-checking facts. It targets journalists and data analysts who need to quickly produce reliable, engaging content from spreadsheets.

The seven agents work in sequence. Agent one (Data Parser) extracts and structures the CSV content. Agent two (Context Researcher) gathers background information from the web. Agent three (Story Writer) drafts the article in journalistic style. Agent four (Fact Checker) verifies every claim against the source data and external sources. Agent five (Visual Designer) creates charts, graphs, and interactive elements. Agent six (Editor) polishes language and flow. Agent seven (Publisher) formats the final output for web or print.

“The fact-checking agent is the key innovation — it doesn’t just trust the data, it actively searches for contradictions and missing context,” the developers stated.

The interactive features allow readers to explore the underlying data themselves. Hovering over charts reveals raw numbers. Clicking on claims links to the original CSV row. This transparency aims to rebuild trust in data-driven journalism.

Ruthless editing cut fluff. The process reduces a typical news production cycle from hours to minutes. However, the tool is designed for augmentation, not replacement. Journalists still set the editorial direction and approve the final output.

The Inverted Pyramid in Practice

Who: Data2Story, developed by a research team at the University of the Basque Country.
What: A multi-agent AI system that turns CSV files into verified, interactive news articles.
When: Released as an open-source prototype in early 2024.
Why: To speed up data journalism while maintaining accuracy and reader engagement.

Behind the Scenes: The Seven Agents

Each agent has a distinct role, but they share a common goal: produce a publishable article with minimal human oversight.

  • Data Parser — Identifies column types, missing values, and statistical outliers. It normalizes dates, currencies, and units.
  • Context Researcher — Searches the web for relevant events, definitions, and expert quotes related to the dataset’s topic.
  • Story Writer — Builds a narrative using the inverted pyramid structure. It selects the most newsworthy angle based on the data’s anomalies or trends.
  • Fact Checker — Compares every numerical claim with the CSV values and external sources. Flags any discrepancy for human review.
  • Visual Designer — Chooses appropriate chart types (bar, line, scatter) and adds interactive tooltips. Generates accessible alt text for each visual.
  • Editor — Improves clarity, corrects grammatical errors, and ensures consistent tone. It also checks for biased language.
  • Publisher — Outputs the article in HTML, Markdown, or plain text. Embeds interactive elements via JavaScript libraries.

Scannable headings help readers jump to their interest. The tool also logs all agent actions, creating an audit trail for verification.

Limitations and Future Directions

The system struggles with messy or incomplete datasets. It cannot yet handle real-time streaming data. The team plans to add support for JSON and SQL databases in the next release.

Privacy remains a concern. The fact-checking agent sends queries to external search engines, potentially exposing sensitive data. An offline mode is under development.

Why This Matters

Data2Story lowers the barrier to data-driven reporting. Small newsrooms without dedicated data journalists can now produce interactive stories. Larger outlets can use it to automate routine coverage, freeing staff for investigative projects.

The open-source nature allows anyone to inspect, modify, or improve the agents. This transparency aligns with journalistic ethics around methodological disclosure.

Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.

What are your thoughts on this? I’d love to hear about your own experiences in the comments below.