DataTalk: All Documents and Data, All at Once, All Verified2024-2025

Investigative journalism often relies on the ability to mine diverse data sets, with both structured and unstructured forms. In collaboration with Stanford’s Big Local News initiative, the Stanford Department of Computer Science, and Columbia Journalism School, DataTalk: All Documents and Data, All at Once, All Verified aims to develop trustworthy conversational agents for journalists to uncover insights from such hybrid data sources using natural-language queries. Building on a novel programming language, SUQL (Structured and Unstructured Query Language), the project will expand the current research prototype into a full development framework to enable non-AI experts to quickly deploy tools to probe complex datasets and fact-check results to produce groundbreaking stories.


The Team