← Essays

Query Provenance Store: The Accountability Layer for AI Agent Decisions

When an agent makes a decision, you should be able to see exactly what it saw, and whether reality has changed since. Introducing QPS, a companion spec to TMS.

·6 min read

The Problem We Didn't Know We Had

We built Tabular Manifolds to give AI agents a structured, multi-resolution view of operational data. Level 0 for situational awareness. Level 1 for behavioral geometry. Level 2 for raw evidence.

It worked. Agents could reason over pricing data, inventory snapshots, conversion funnels. All without drowning in tokens or losing signal in noise.

But then someone asked a simple question:

"The agent said prices spiked in July. How do I verify that?"

And we realized we'd built a cognitive interface without an accountability layer.


The Auditability Gap

Here's what was happening:

  1. Agent receives a manifold showing a price anomaly
  2. Agent reasons over the data and makes a recommendation
  3. Human asks: "Show me the evidence"
  4. Agent points to Level 2 telemetry... which was a preview of 3 rows
  5. Human asks: "What about the other 514 rows?"
  6. No answer.

The manifold told the agent what the data showed. But it didn't preserve how to get back to the source. The agent couldn't replay its own evidence.

This matters because:

  • Debugging is impossible without knowing what the agent actually saw
  • Drift happens, the data at decision time might differ from data at audit time
  • Trust requires verification, not just assertion

Enter the Query Provenance Store

The Query Provenance Store (QPS) is a companion spec to TMS. It's simple in concept:

Every manifold can reference a QPS entry. Every QPS entry contains the exact query that produced the manifold.

┌─────────────────┐         ┌─────────────────┐         ┌─────────────────┐
│   TMS Manifold  │────────>│   QPS Entry     │────────>│  Data Platform  │
│   (cognitive)   │  ref    │   (provenance)  │  exec   │  (source)       │
└─────────────────┘         └─────────────────┘         └─────────────────┘

The manifold stays clean. It's still a cognitive interface. QPS handles the provenance separately.


What Lives in a QPS Entry

A QPS entry has two immutable parts and one append-only part:

The Query

{
  "query": {
    "dialect": "databricks_sql",
    "template": "SELECT ts, unit_price, po_id, notes FROM silver.price_events WHERE part_id = :part_id AND supplier_id = :supplier_id AND ts >= :start AND ts < :end ORDER BY ts",
    "params": {
      "part_id": "P-123456",
      "supplier_id": "S-789",
      "start": "2025-01-01",
      "end": "2026-01-01"
    }
  }
}

This is the exact query that built the manifold. Parameterized, never interpolated. A human can read it, understand it, run it.

The Generation Record

{
  "generation": {
    "generated_at": "2026-01-14T22:30:00Z",
    "generated_by": "tms_generator_v1.2",
    "row_count": 517,
    "checksum": "sha256:a1b2c3d4..."
  }
}

This captures what the data looked like at generation time. The checksum is key: it lets us detect drift later.

The Execution Log

{
  "executions": [
    {
      "executed_at": "2026-01-15T10:00:00Z",
      "executed_by": "procurement_agent_v2",
      "row_count": 517,
      "checksum": "sha256:a1b2c3d4...",
      "drift_detected": false
    },
    {
      "executed_at": "2026-01-16T14:30:00Z",
      "executed_by": "human_audit",
      "row_count": 523,
      "checksum": "sha256:d4e5f678...",
      "drift_detected": true,
      "drift_note": "6 late-arriving POs"
    }
  ]
}

Every time someone replays the query, we log it. If the results differ from generation time, we flag it as drift.


Drift Detection: The Quiet Killer

Here's a scenario that happens more than you'd think:

  1. Monday: Agent analyzes supplier pricing, recommends renegotiation
  2. Tuesday: Finance team reviews the recommendation
  3. Wednesday: Late-arriving invoices hit the silver table
  4. Thursday: Finance runs the same analysis, gets different numbers
  5. Friday: Everyone argues about whose numbers are right

With QPS, the answer is clear:

{
  "drift_detected": true,
  "drift_type": "row_count_increase",
  "drift_delta": {
    "row_count_expected": 517,
    "row_count_actual": 523,
    "rows_added": 6
  }
}

The agent wasn't wrong. The data changed. Now you know.


The Design Decision: Why Not Embed Queries in Manifolds?

We considered putting reconstruction queries directly in TMS manifolds. It would be simpler. One artifact instead of two.

But it creates problems:

  1. Security: Query templates expose schema details. Not everyone who should see a manifold should see the underlying queries.

  2. Coupling: Manifolds become tied to a specific execution environment. A manifold generated from Databricks can't be shared with someone using Snowflake.

  3. Execution logging: Where do you put the execution history? Inside the manifold? Now it's mutable. Outside? You've reinvented QPS.

The separation is intentional:

LayerResponsibility
TMS ManifoldWhat the data shows (cognitive interface)
QPS EntryHow the data was produced (provenance)
Data PlatformWhere the data lives (execution)

Manifolds stay portable. Provenance stays auditable. Execution stays flexible.


Integration with MCP

QPS is designed to work with Model Context Protocol tools. The pattern is straightforward:

@mcp_tool
def qps_reconstruct(qps_id: str) -> dict:
    """
    Reconstruct telemetry from a QPS entry.
    
    Returns rows + drift status.
    """
    # 1. Look up QPS entry
    # 2. Execute query
    # 3. Compare checksum to generation
    # 4. Log execution
    # 5. Return rows + drift status

An agent investigating an anomaly can:

  1. Read the manifold's Level 0 summary
  2. Notice a quality flag
  3. Call qps_reconstruct to get the full telemetry
  4. Receive both the data AND whether it's drifted since the manifold was built

That last part is crucial. The agent knows if it's looking at the same reality the manifold described.


What This Enables

With TMS + QPS together, you get:

Traceable Reasoning

Agent says "prices spiked 82% in July" → you can replay the exact query and see the exact rows that led to that conclusion.

Blame Attribution

When an agent makes a bad call, you can now ask:

  • Did it see bad data? (Replay the reconstruction)
  • Did it misinterpret good data? (Compare conclusion to evidence)
  • Did the data change? (Check drift status)

Temporal Debugging

"What did the agent see at 10am Monday?" is now answerable. The QPS entry preserves the generation-time checksum. You can detect if current data differs.

Human-in-the-Loop Audit

Before acting on an agent recommendation, a human can:

  1. Pull the QPS entry
  2. Run the query themselves
  3. Verify the manifold accurately represents the source
  4. Check for drift since generation

The Bigger Picture

We've been thinking about this as three layers:

┌─────────────────────────────────────────────────────────────────┐
│                     Agent Reasoning Layer                       │
│         (COT traces, analysis prompts, tool calls)              │
└─────────────────────────────────────────────────────────────────┘
                              │
                              v
┌─────────────────────────────────────────────────────────────────┐
│                    TMS: Cognitive Interface                     │
│   Level 0: "Here's the situation"                               │
│   Level 1: "Here's the shape"                                   │
│   Level 2: "Here's the evidence"                                │
│   Lineage: "Here's how to verify"  ──────────┐                  │
└─────────────────────────────────────────────────────────────────┘
                                               │
                                               v
┌─────────────────────────────────────────────────────────────────┐
│                    QPS: Provenance Layer                        │
│   Query: "This is exactly how I was built"                      │
│   Generation: "This is what existed when I was built"           │
│   Executions: "This is what happened when someone checked"      │
└─────────────────────────────────────────────────────────────────┘
                              │
                              v
┌─────────────────────────────────────────────────────────────────┐
│                    Data Platform Layer                          │
│              (Databricks, Snowflake, whatever)                  │
└─────────────────────────────────────────────────────────────────┘

Most agent architectures have the reasoning layer. Some have a cognitive interface. Almost none have a provenance layer.

And provenance is what you need when an agent makes a consequential decision and someone asks "why did it do that?"


Try It

The QPS Specification v1.0 is available now. It includes:

  • Full schema for query blocks, generation records, and execution logs
  • Drift detection framework with typed drift categories
  • MCP tool patterns for reconstruction and drift checking
  • Integration patterns with TMS lineage blocks
  • Storage backend considerations

QPS is released under Apache 2.0, same as TMS.


Final Thought

We didn't build QPS because we wanted more specs.

We built it because accountability shouldn't be an afterthought.

When an agent makes a decision, you should be able to see exactly what it saw, and whether reality has changed since.

That's what QPS provides. Not trust through assertion. Trust through verification.

Read the full specification →