Specificationv1.1

Tabular Manifold Spec

A cognitive transmission format for AI agents. Feature engineering as an interface contract.

QPS companion spec →Reference examples ↓Complete example ↓

1. What is TMS?

The Tabular Manifold Spec defines a structured, multi-resolution data format optimized for consumption by AI agents and LLMs.

TMS is:

Not a storage format. Use Parquet or Delta for that.
Not a visualization format. Use dashboards for humans.
A cognitive transmission format. The interface layer between data pipelines and agent reasoning.

TMS manifolds sit alongside dashboards, not instead of them:

                    ┌─────────────────┐
                    │   Data Lake     │
                    │ (Parquet/Delta) │
                    └────────┬────────┘
                             │
              ┌──────────────┴──────────────┐
              │                             │
              v                             v
    ┌─────────────────┐           ┌─────────────────┐
    │   Dashboards    │           │  TMS Manifolds  │
    │  (for humans)   │           │  (for agents)   │
    └─────────────────┘           └─────────────────┘

2. Core principles

2.1 Columnar encoding

Keys appear once. Values are dense arrays aligned to column order.

{
  "format": "columnar_json_v1",
  "schema": {
    "columns": [
      { "name": "period", "type": "string" },
      { "name": "value", "type": "double" }
    ]
  },
  "rows": [
    ["2025-01", 10.5],
    ["2025-02", 11.2]
  ]
}

Why: Eliminates key repetition. A 1,000-row table with 10 columns saves ~9,000 key tokens.

2.2 Progressive disclosure (three levels)

Level	Name	Purpose	Token cost
Level 0	Summary	Instant situational awareness	200 to 500 tokens
Level 1	Geometry	Aggregated structure and trends	500 to 2,000 tokens
Level 2	Telemetry	Raw evidence for forensics	2,000 to 50,000+ tokens

Agents start at Level 0. They drill down only when anomalies or quality flags demand it.

2.3 Self-describing

Every manifold contains enough metadata that an agent can interpret it without external documentation:

Column types and descriptions
Quality flags and reliability scores
Interpretation hints in natural language

3. Manifold envelope schema

{
  "artifact_type": "tabular_manifold",
  "artifact_version": "1.1",
  "manifold_kind": "<canonical_kind>",

  "subject": { },
  "time_window": { },

  "level_0_summary": { },
  "level_1_geometry": { },
  "level_2_telemetry": { },

  "token_budget": { },
  "lineage": { }
}

3.1 Required fields

Field	Type	Description
`artifact_type`	string	Always "tabular_manifold"
`artifact_version`	string	Spec version (e.g., "1.1")
`manifold_kind`	enum	One of the canonical kinds (see §4)
`subject`	object	What this manifold describes
`level_0_summary`	object	Required. The cheap cognitive entry point.

3.2 Optional fields

Field	Type	Description
`time_window`	object	For time-based manifolds
`level_1_geometry`	object	Aggregated data in columnar format
`level_2_telemetry`	object	Raw data in columnar format
`token_budget`	object	Hints for agent token management
`lineage`	object	Provenance metadata

4. Canonical manifold kinds

TMS defines five canonical manifold kinds. Each has a fixed Level 0 schema with optional extensions.

`timeseries_metric`

For any metric observed over time (prices, counts, rates).

Level 0 required fields:

{
  "observation_count": 150,
  "time_coverage": {
    "expected_periods": 12,
    "observed_periods": 10,
    "coverage_ratio": 0.833
  },
  "distribution": {
    "min": 9.85,
    "max": 18.40,
    "mean": 12.62,
    "median": 10.90,
    "stddev": 3.44,
    "cv": 0.273
  },
  "reliability": { },
  "quality_flags": { },
  "interpretation_hints": []
}

`funnel_conversion`

For sequential stage-based processes (sales funnels, onboarding flows).

Level 0 required fields:

{
  "stage_count": 5,
  "total_entered": 10000,
  "total_converted": 342,
  "overall_conversion_rate": 0.0342,
  "bottleneck_stage": "checkout",
  "bottleneck_drop_rate": 0.67,
  "reliability": { },
  "quality_flags": { },
  "interpretation_hints": []
}

`cohort_behavior`

For tracking groups over time (user cohorts, customer segments).

Level 0 required fields:

{
  "cohort_count": 12,
  "total_subjects": 5000,
  "observation_periods": 6,
  "retention_summary": {
    "period_1": 0.85,
    "period_3": 0.62,
    "period_6": 0.41
  },
  "reliability": { },
  "quality_flags": { },
  "interpretation_hints": []
}

`inventory_snapshot`

For point-in-time inventory or resource states.

Level 0 required fields:

{
  "snapshot_timestamp": "2025-01-14T00:00:00Z",
  "total_skus": 1500,
  "total_units": 125000,
  "total_value": 2500000.00,
  "stockout_skus": 45,
  "overstock_skus": 120,
  "reliability": { },
  "quality_flags": { },
  "interpretation_hints": []
}

`anomaly_detection`

For systems monitoring and alerting contexts.

Level 0 required fields:

{
  "detection_window": {
    "start": "2025-01-01T00:00:00Z",
    "end": "2025-01-14T00:00:00Z"
  },
  "anomaly_count": 3,
  "severity_distribution": {
    "critical": 1,
    "warning": 2,
    "info": 0
  },
  "top_anomaly": {
    "timestamp": "2025-01-10T14:30:00Z",
    "metric": "cpu_usage",
    "observed": 98.5,
    "expected_range": [20, 60],
    "severity": "critical"
  },
  "reliability": { },
  "quality_flags": { },
  "interpretation_hints": []
}

5. Reliability block (required in Level 0)

Every Level 0 must include a reliability block that quantifies confidence in the summary statistics.

"reliability": {
  "sample_size_class": "sparse|adequate|robust",
  "sample_size_n": 5,
  "sample_size_threshold_adequate": 30,
  "sample_size_threshold_robust": 100,

  "confidence_in_mean": {
    "level": 0.95,
    "margin_of_error": 2.4,
    "interval": [10.22, 15.02]
  },

  "data_quality_score": 0.85,
  "data_quality_notes": "3% of rows had imputed values",

  "staleness": {
    "last_observation": "2025-11-21T00:00:00Z",
    "days_since_last": 54,
    "is_stale": true,
    "stale_threshold_days": 30
  }
}

Sample size classes

Class	Criteria	Implication
`sparse`	n < 30	Summary stats are unstable. Treat with caution.
`adequate`	30 ≤ n < 100	Stats are reasonable but not rock-solid.
`robust`	n ≥ 100	High confidence in summary statistics.

6. Quality flags (required in Level 0)

Standardized boolean flags that trigger agent attention:

"quality_flags": {
  "low_sample_size": true,
  "missing_periods": true,
  "suspected_outliers": true,
  "data_staleness": true,
  "high_variance": false,
  "imputation_applied": false,
  "schema_drift_detected": false
}

Flag	Trigger condition
`low_sample_size`	reliability.sample_size_class == "sparse"
`missing_periods`	time_coverage.coverage_ratio < 0.8
`suspected_outliers`	Any value > 3σ from mean, or IQR-based detection
`data_staleness`	reliability.staleness.is_stale == true
`high_variance`	distribution.cv > 0.3
`imputation_applied`	Any values were filled or estimated
`schema_drift_detected`	Column types or names changed from baseline

7. Token budget block

Helps agents decide whether to load deeper levels:

"token_budget": {
  "level_0_tokens_approx": 450,
  "level_1_tokens_approx": 1200,
  "level_2_tokens_approx": 8500,

  "level_2_row_count": 517,
  "level_2_inline_row_limit": 50,
  "level_2_inline_strategy": "preview_outliers",

  "compression_ratios": {
    "level_1_vs_level_2": 7.1,
    "level_0_vs_level_2": 18.9
  },

  "recommended_strategy": "Load Level 0 first. If quality_flags has any true values, load Level 1. Only load Level 2 if investigating specific anomalies."
}

Level 2 inline strategies

Strategy	Behavior
`preview_outliers`	Inline only rows flagged as outliers or anomalies
`preview_recent`	Inline only the N most recent rows
`preview_sample`	Inline a random sample of N rows
`full_inline`	Inline all rows (use only if row_count < limit)
`none`	No rows inlined. Agent must use retrieval.

8. Level 1 geometry schema

Level 1 uses columnar encoding for aggregated data.

"level_1_geometry": {
  "format": "columnar_json_v1",
  "granularity": "month",

  "schema": {
    "columns": [
      { "name": "period", "type": "string", "description": "Aggregation bucket (YYYY-MM)" },
      { "name": "n", "type": "integer", "description": "Observation count in period" },
      { "name": "min", "type": "double", "description": "Minimum value in period" },
      { "name": "max", "type": "double", "description": "Maximum value in period" },
      { "name": "mean", "type": "double", "description": "Mean value in period" },
      { "name": "median", "type": "double", "description": "Median value in period" },
      { "name": "stddev", "type": "double", "description": "Standard deviation", "nullable": true },
      { "name": "flag", "type": "string", "description": "Optional anomaly flag", "nullable": true }
    ],
    "primary_sort": ["period"]
  },

  "rows": [
    ["2025-01", 15, 9.80, 11.20, 10.45, 10.40, 0.35, null],
    ["2025-02", 12, 10.10, 11.50, 10.80, 10.75, 0.42, null],
    ["2025-07", 3, 17.50, 18.90, 18.20, 18.40, 0.70, "spike_detected"]
  ],

  "missing_periods": ["2025-03", "2025-04", "2025-05", "2025-06"]
}

9. Level 2 telemetry schema

Level 2 contains raw observations. For large datasets, use preview plus retrieval.

"level_2_telemetry": {
  "format": "columnar_json_v1",

  "schema": {
    "columns": [
      { "name": "ts", "type": "timestamp", "description": "Observation timestamp" },
      { "name": "value", "type": "double", "description": "Observed value" },
      { "name": "document_id", "type": "string", "description": "Source document reference", "nullable": true },
      { "name": "notes", "type": "string", "description": "Human or ETL notes", "nullable": true }
    ],
    "primary_sort": ["ts"]
  },

  "row_count_total": 517,

  "inline_rows": {
    "strategy": "preview_outliers",
    "rows": [
      ["2025-07-09T00:00:00Z", 18.40, "INV-12002", "Expedite fee applied"],
      ["2025-07-15T00:00:00Z", 17.90, "INV-12015", "Small lot surcharge"]
    ]
  },

  "retrieval": {
    "method": "mcp_tool",
    "tool_name": "get_timeseries_telemetry",
    "tool_args": {
      "manifold_id": "mfld_abc123",
      "level": 2,
      "filters": {}
    },
    "pagination": {
      "default_page_size": 100,
      "max_page_size": 500
    }
  }
}

When to inline vs retrieve

Row count	Recommendation
≤ 50	Full inline ("strategy": "full_inline")
51 to 500	Preview inline plus retrieval available
> 500	Preview inline only. Retrieval required for full data.

10. Interpretation hints

Natural language guidance for agents. Always an array of strings:

"interpretation_hints": [
  "Sparse series: only 5 observations across 12 months. Summary statistics are unreliable.",
  "July 2025 shows a price spike (18.40 vs median 10.90). Investigate Level 2 for evidence.",
  "High coefficient of variation (0.27) suggests inconsistent pricing or mixed product types.",
  "Coverage ratio is 0.42, meaning 58% of expected periods have no data."
]

Guidelines for hint authoring

Lead with the most actionable insight
Reference specific numbers from the manifold
Suggest next steps (e.g., "Investigate Level 2")
Keep each hint to one or two sentences

11. Lineage block

Provenance metadata for auditability:

"lineage": {
  "manifold_id": "mfld_abc123",
  "computed_at": "2026-01-14T22:10:00Z",
  "computed_by": "tms_generator_v1.2",
  "computation_duration_ms": 450,

  "inputs": [
    {
      "dataset_id": "purchase_orders_silver",
      "dataset_version": "v5.2",
      "row_count": 125000,
      "as_of_timestamp": "2026-01-14T00:00:00Z"
    }
  ],

  "filters_applied": [
    { "field": "supplier_id", "operator": "eq", "value": "S-789" },
    { "field": "part_id", "operator": "eq", "value": "P-123456" }
  ],

  "transformations": [
    "Converted unit_price from cents to dollars",
    "Excluded cancelled PO lines",
    "Imputed missing facility codes as 'UNKNOWN'"
  ]
}

12. Complete example

A full timeseries_metric manifold:

{
  "artifact_type": "tabular_manifold",
  "artifact_version": "1.1",
  "manifold_kind": "timeseries_metric",

  "subject": {
    "entity_type": "part_supplier_price",
    "part_id": "P-123456",
    "part_number": "WIDGET-SS-075",
    "part_description": "Widget, stainless steel, 3/4 inch",
    "supplier_id": "S-789",
    "supplier_name": "Acme Industrial Supply",
    "metric_name": "unit_price",
    "currency": "USD",
    "unit_of_measure": "EA"
  },

  "time_window": {
    "start": "2025-01-01",
    "end": "2025-12-31",
    "timezone": "UTC",
    "granularity": "month"
  },

  "level_0_summary": {
    "observation_count": 5,
    "time_coverage": {
      "expected_periods": 12,
      "observed_periods": 4,
      "coverage_ratio": 0.333
    },
    "distribution": {
      "min": 9.85,
      "max": 18.40,
      "mean": 12.01,
      "median": 10.70,
      "stddev": 3.44,
      "cv": 0.286
    },
    "reliability": {
      "sample_size_class": "sparse",
      "sample_size_n": 5
    },
    "quality_flags": {
      "low_sample_size": true,
      "missing_periods": true,
      "suspected_outliers": true,
      "data_staleness": true
    },
    "interpretation_hints": [
      "Sparse data: only 5 observations across 12 months.",
      "One outlier detected: $18.40 in July (82% above median).",
      "Recommended: inspect Level 1 to identify spike month."
    ]
  },

  "level_1_geometry": {
    "format": "columnar_json_v1",
    "granularity": "month",
    "schema": {
      "columns": [
        { "name": "period", "type": "string" },
        { "name": "n", "type": "integer" },
        { "name": "mean", "type": "double" },
        { "name": "flag", "type": "string", "nullable": true }
      ]
    },
    "rows": [
      ["2025-01", 1, 10.10, null],
      ["2025-03", 2, 10.80, null],
      ["2025-07", 1, 18.40, "outlier_spike"],
      ["2025-11", 1, 9.85, null]
    ]
  },

  "level_2_telemetry": {
    "format": "columnar_json_v1",
    "row_count_total": 5,
    "inline_rows": {
      "strategy": "full_inline",
      "rows": [
        ["2025-01-14T00:00:00Z", 10.10, "PO-555"],
        ["2025-03-02T00:00:00Z", 10.70, "PO-612"],
        ["2025-03-28T00:00:00Z", 10.90, "PO-640"],
        ["2025-07-09T00:00:00Z", 18.40, "PO-777"],
        ["2025-11-21T00:00:00Z", 9.85, "PO-901"]
      ]
    }
  },

  "lineage": {
    "manifold_id": "mfld_price_P123456_S789_2025",
    "computed_at": "2026-01-14T22:30:00Z",
    "computed_by": "tms_generator_v1.1"
  }
}

13. MCP integration pattern

TMS manifolds are designed to be returned by MCP tools. Recommended pattern:

@mcp_tool
def get_price_manifold(part_id: str, supplier_id: str, year: int) -> dict:
    """
    Returns a TMS manifold for part/supplier price history.

    The manifold includes:
    - Level 0: Summary statistics and quality flags
    - Level 1: Monthly aggregates
    - Level 2: Raw transactions (preview only if >50 rows)

    Start with level_0_summary. Check quality_flags to decide
    whether deeper investigation is needed.
    """
    # ... generate manifold ...
    return manifold

Tool documentation should instruct the agent to:

Always read level_0_summary first
Check quality_flags for any true values
If flags are set, read level_1_geometry to locate the issue
Only load level_2_telemetry when investigating specific anomalies
Use interpretation_hints as reasoning guidance

14. Reference examples

Four polished TMS reference manifolds covering a fictional specialty packaging manufacturer, TechnoFlex Industries. Each is a full L0 + L1 + L2 envelope with rich telemetry, Pareto-truncated rollups, and interpretation hints. Suitable as drop-in templates for procurement, spend-intel, and supply-chain agents.

Note: these examples use manifold_kind: "entity_profile", a draft v1.2 extension to the five canonical kinds above. The envelope, reliability, quality_flags, token_budget, and lineage blocks are v1.1-compatible.

Item profile

TechnoSeal IM-1652 Ionomer Resin

Sole-sourced sealing-layer material, $4.02M annual spend, HHI 1.0. Demonstrates supplier_concentration, low_supplier_diversity flag, and propylene-tracked price discipline.

View →Download item.json ↓

Supplier profile

Helvian Specialty Polymers GmbH

Multi-commodity strategic vendor, $5.67M annual, 18 SKUs across 6 commodity groups. Shows item_concentration, weighted_avg_item_cv vs global CV, and Pareto-truncated rollups with tail mix.

View →Download supplier.json ↓

Commodity profile

Industrial Colorants & Pigments

94.7% concentrated on one supplier, one SKU. Shows high_supplier_concentration and high_item_concentration flags driving an unambiguous Level 0 sourcing decision.

View →Download commodity.json ↓

Supplier-category profile

Specialty Chemicals & Additives

$35.6M category, 57 suppliers, 288 SKUs, 35 industries. Demonstrates direct_indirect_mix, industry_rollup, and how a UoM-encoded outlier surfaces in Level 2 telemetry preview.

View →Download supplier_category.json ↓

15. Author manifolds with Claude

create-tms-manifold is a Claude skill that walks Claude through generating a TMS v1.2 entity_profile manifold from a PO-line silver table. Drop it into your ~/.claude/skills/ directory and Claude will use it whenever you ask for a procurement, supplier, commodity, or supplier-category manifold.

The skill covers:

The L0 / L1 / L2 envelope and which sub-blocks are required per entity type
Pareto truncation rules including tie-breaking
Each quality_flag trigger condition and where the flag belongs
HHI math at the rollup grain, including the truncated tail
A validation checklist Claude runs before returning the JSON
Templates for the empty envelope and discipline-rating bands

Skill frontmatter

---
name: create-tms-manifold
description: Build a TMS v1.2 entity_profile manifold from PO-line data. Generates the L0 summary, L1 geometry, L2 telemetry preview, and lineage block in one envelope.
---

Install

mkdir -p ~/.claude/skills/create-tms-manifold
curl -o ~/.claude/skills/create-tms-manifold/SKILL.md \
  https://canonical.agency/skills/create-tms-manifold/SKILL.md

Then prompt Claude with something like “Build a TMS entity_profile manifold for supplier GCP-...” and the skill will guide the work from query construction through validation.

Download SKILL.md ↓View raw

Changelog

v1.1 (2026-01-14)

Added required reliability block to Level 0
Defined five canonical manifold_kind values with fixed Level 0 schemas
Added token_budget block with compression ratios
Specified inline_rows.strategy enum for Level 2
Removed underspecified drilldown_policy DSL (replaced with interpretation_hints and recommended_strategy)
Added outlier_summary to Level 0 for timeseries_metric
Clarified inline vs retrieval thresholds

v1.0 (2026-01-10)

Initial spec

License

TMS is released under Apache 2.0. Use it freely.

QPS companion spec →