Query Provenance Store
A companion spec to the Tabular Manifold Spec (TMS). Where queries go to be remembered.
1. What is QPS?
The Query Provenance Store is a registry that maintains the relationship between manifolds and the queries that generated them.
QPS is:
- Not a query engine. It does not execute queries.
- Not a cache. It does not store results.
- A provenance ledger. It records what query built what manifold, and tracks what happens when those queries are re-executed.
TMS manifolds reference QPS entries. QPS entries contain the actual reconstruction logic.
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ TMS Manifold │────────>│ QPS Entry │────────>│ Data Platform │
│ (cognitive) │ ref │ (provenance) │ exec │ (source) │
└─────────────────┘ └─────────────────┘ └─────────────────┘2. Core principles
2.1 Separation of concerns
| Layer | Responsibility |
|---|---|
| TMS Manifold | What the data shows (cognitive interface) |
| QPS Entry | How the data was produced (provenance) |
| Data Platform | Where the data lives (execution) |
Manifolds stay clean and portable. Provenance stays auditable. Execution stays flexible.
2.2 Immutable generation, mutable execution history
A QPS entry has two parts:
- Generation record. Immutable. Captures exactly what query produced the manifold.
- Execution log. Append-only. Tracks every replay and whether drift occurred.
2.3 Drift detection as a first-class concept
When a reconstruction query returns different data than at generation time, that is drift. QPS does not prevent drift. It detects and records it.
3. QPS entry schema
3.1 Required fields
| Field | Type | Description |
|---|---|---|
qps_id | string | Unique identifier (should match manifold_id for 1:1 cases) |
qps_version | string | Spec version (e.g., "1.0") |
query | object | The reconstruction query (see §3.2) |
generation | object | Metadata about when and how the manifold was built (see §3.3) |
3.2 Query block
{
"query": {
"dialect": "databricks_sql",
"template": "SELECT ts, unit_price, po_id, notes FROM silver.price_events WHERE part_id = :part_id AND supplier_id = :supplier_id AND ts >= :start AND ts < :end ORDER BY ts",
"params": {
"part_id": "P-123456",
"supplier_id": "S-789",
"start": "2025-01-01",
"end": "2026-01-01"
},
"param_types": {
"part_id": "string",
"supplier_id": "string",
"start": "date",
"end": "date"
}
}
}Supported dialects
| Dialect | Description |
|---|---|
databricks_sql | Databricks SQL warehouse |
snowflake | Snowflake SQL |
bigquery | Google BigQuery Standard SQL |
postgres | PostgreSQL |
duckdb | DuckDB |
mcp_tool | Opaque MCP tool call (see §3.2.1) |
3.2.1 Opaque tool references
When the query should not be exposed (security, complexity, or abstraction):
{
"query": {
"dialect": "mcp_tool",
"tool_name": "get_price_telemetry",
"tool_args": {
"manifold_id": "mfld_abc123"
}
}
}The tool implementation handles actual query execution internally.
3.3 Generation block
{
"generation": {
"generated_at": "2026-01-14T22:30:00Z",
"generated_by": "tms_generator_v1.2",
"source_dataset": "silver.price_events",
"source_dataset_version": "v5.2",
"row_count": 517,
"checksum": "sha256:a1b2c3d4e5f6...",
"checksum_method": "row_content_hash"
}
}Checksum methods
| Method | Description |
|---|---|
row_content_hash | SHA256 of sorted, serialized row content |
row_count_only | Just the count (weak but cheap) |
column_stats_hash | Hash of min/max/sum per column |
none | No checksum computed |
3.4 Optional fields
| Field | Type | Description |
|---|---|---|
executions | array | Log of reconstruction attempts (see §4) |
access_control | object | Who can execute this query |
ttl | object | Retention policy for this entry |
related_manifolds | array | Other manifolds built from the same query |
notes | string | Human-readable context |
4. Execution log schema
Each time a reconstruction query is executed, an entry is appended:
{
"executions": [
{
"executed_at": "2026-01-15T10:00:00Z",
"executed_by": "agent_session_xyz",
"execution_context": "manifold_drilldown",
"row_count": 517,
"checksum": "sha256:a1b2c3d4e5f6...",
"drift_detected": false,
"execution_time_ms": 234
},
{
"executed_at": "2026-01-16T14:30:00Z",
"executed_by": "human_debug_session",
"execution_context": "manual_audit",
"row_count": 523,
"checksum": "sha256:d4e5f67890ab...",
"drift_detected": true,
"drift_type": "row_count_increase",
"drift_delta": {
"row_count_expected": 517,
"row_count_actual": 523,
"rows_added": 6,
"rows_removed": 0
},
"drift_note": "Late-arriving POs from batch reconciliation",
"execution_time_ms": 287
}
]
}Drift types
| Type | Description |
|---|---|
row_count_increase | More rows than at generation |
row_count_decrease | Fewer rows than at generation |
content_change | Same row count, different content |
schema_change | Column structure changed |
query_failure | Query no longer executes |
5. TMS integration
5.1 Manifold reference to QPS
In a TMS manifold, the lineage block references QPS:
{
"lineage": {
"manifold_id": "mfld_abc123",
"qps_id": "mfld_abc123",
"reconstruction_available": true
}
}Or with explicit tool routing:
{
"lineage": {
"manifold_id": "mfld_abc123",
"qps_id": "mfld_abc123",
"reconstruction_available": true,
"reconstruction_method": "mcp_tool",
"tool_name": "qps_reconstruct",
"tool_args_template": {
"qps_id": "mfld_abc123"
}
}
}5.2 MCP tool patterns
@mcp_tool
def qps_reconstruct(qps_id: str, log_execution: bool = True) -> dict:
"""
Reconstruct telemetry from a QPS entry.
Args:
qps_id: The QPS entry identifier
log_execution: Whether to append to execution log (default: True)
Returns:
{
"rows": [...],
"drift_detected": bool,
"drift_summary": {...} | null
}
"""
# 1. Look up QPS entry
# 2. Execute query
# 3. Compare checksum to generation
# 4. Log execution if requested
# 5. Return rows + drift status@mcp_tool
def qps_check_drift(qps_id: str) -> dict:
"""
Check if a QPS entry would return different data than at generation.
Does NOT log an execution (dry run).
Returns:
{
"drift_detected": bool,
"drift_type": str | null,
"drift_delta": {...} | null
}
"""@mcp_tool
def qps_get_entry(qps_id: str) -> dict:
"""
Retrieve the full QPS entry including query and execution history.
For human debugging and audit.
"""6. Complete example
QPS entry
{
"qps_id": "mfld_price_P123456_S789_2025",
"qps_version": "1.0",
"query": {
"dialect": "databricks_sql",
"template": "SELECT ts, unit_price, po_id, notes FROM silver.price_events WHERE part_id = :part_id AND supplier_id = :supplier_id AND ts >= :start AND ts < :end ORDER BY ts",
"params": {
"part_id": "P-123456",
"supplier_id": "S-789",
"start": "2025-01-01",
"end": "2026-01-01"
},
"param_types": {
"part_id": "string",
"supplier_id": "string",
"start": "date",
"end": "date"
}
},
"generation": {
"generated_at": "2026-01-14T22:30:00Z",
"generated_by": "tms_generator_v1.2",
"source_dataset": "silver.price_events",
"source_dataset_version": "v5.2",
"row_count": 517,
"checksum": "sha256:a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890",
"checksum_method": "row_content_hash"
},
"executions": [
{
"executed_at": "2026-01-15T10:00:00Z",
"executed_by": "procurement_agent_v2",
"execution_context": "price_anomaly_investigation",
"row_count": 517,
"checksum": "sha256:a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890",
"drift_detected": false,
"execution_time_ms": 234
}
],
"access_control": {
"allowed_roles": ["procurement_analyst", "agent_service_account"],
"requires_audit_log": true
},
"notes": "Price history for Acme Industrial Supply on stainless steel widgets. July 2025 shows expedite fee anomalies."
}Corresponding TMS manifold (lineage block only)
{
"lineage": {
"manifold_id": "mfld_price_P123456_S789_2025",
"qps_id": "mfld_price_P123456_S789_2025",
"computed_at": "2026-01-14T22:30:00Z",
"computed_by": "tms_generator_v1.2",
"reconstruction_available": true,
"reconstruction_method": "mcp_tool",
"tool_name": "qps_reconstruct",
"tool_args_template": {
"qps_id": "mfld_price_P123456_S789_2025"
}
}
}7. Storage considerations
QPS does not mandate a storage backend. Implementations could use:
| Backend | Tradeoffs |
|---|---|
| PostgreSQL or MySQL | ACID, familiar, good for moderate scale |
| Document store (Mongo, Cosmos) | Flexible schema, easy JSON |
| Delta Lake or Iceberg | Co-located with data platform, time travel |
| Git repository | Version control, human-readable, audit trail |
| Embedded in manifold | No external dependency, but loses execution logging |
The key requirements are:
- Generation records are immutable
- Execution logs are append-only
- Queries are retrievable by qps_id
- Checksums can be verified
8. Security notes
- Query templates may contain sensitive schema information. Access to QPS entries should be controlled.
- Execution logs reveal access patterns. Consider retention policies.
- Parameterized queries only. Never store interpolated SQL. Always template plus params.
- The
mcp_tooldialect exists precisely for cases where query exposure is unacceptable.
Relationship to TMS versioning
| TMS version | QPS support |
|---|---|
| TMS 1.0 | No QPS reference (inline queries or no reconstruction) |
| TMS 1.1 | Optional QPS reference via lineage block |
| TMS 1.2+ | Recommended QPS reference for all reconstructable manifolds |
QPS is backwards-compatible. Manifolds without QPS references continue to work. They just are not reconstructable via the standard pattern.
Changelog
v1.0 (2026-01-15)
- Initial specification
- Core schema: query block, generation block, execution log
- Drift detection framework
- TMS integration pattern
- MCP tool patterns
License
QPS is released under Apache 2.0, same as TMS.
QPS exists because debuggability should not be an afterthought.
When an agent makes a decision, you should be able to see exactly what it saw, and whether reality has changed since.