Specificationv1.1

Tabular Manifold Spec

A cognitive transmission format for AI agents. Feature engineering as an interface contract.

1. What is TMS?

The Tabular Manifold Spec defines a structured, multi-resolution data format optimized for consumption by AI agents and LLMs.

TMS is:

  • Not a storage format. Use Parquet or Delta for that.
  • Not a visualization format. Use dashboards for humans.
  • A cognitive transmission format. The interface layer between data pipelines and agent reasoning.

TMS manifolds sit alongside dashboards, not instead of them:

                    ┌─────────────────┐
                    │   Data Lake     │
                    │ (Parquet/Delta) │
                    └────────┬────────┘

              ┌──────────────┴──────────────┐
              │                             │
              v                             v
    ┌─────────────────┐           ┌─────────────────┐
    │   Dashboards    │           │  TMS Manifolds  │
    │  (for humans)   │           │  (for agents)   │
    └─────────────────┘           └─────────────────┘

2. Core principles

2.1 Columnar encoding

Keys appear once. Values are dense arrays aligned to column order.

{
  "format": "columnar_json_v1",
  "schema": {
    "columns": [
      { "name": "period", "type": "string" },
      { "name": "value", "type": "double" }
    ]
  },
  "rows": [
    ["2025-01", 10.5],
    ["2025-02", 11.2]
  ]
}

Why: Eliminates key repetition. A 1,000-row table with 10 columns saves ~9,000 key tokens.

2.2 Progressive disclosure (three levels)

LevelNamePurposeToken cost
Level 0SummaryInstant situational awareness200 to 500 tokens
Level 1GeometryAggregated structure and trends500 to 2,000 tokens
Level 2TelemetryRaw evidence for forensics2,000 to 50,000+ tokens

Agents start at Level 0. They drill down only when anomalies or quality flags demand it.

2.3 Self-describing

Every manifold contains enough metadata that an agent can interpret it without external documentation:

  • Column types and descriptions
  • Quality flags and reliability scores
  • Interpretation hints in natural language

3. Manifold envelope schema

{
  "artifact_type": "tabular_manifold",
  "artifact_version": "1.1",
  "manifold_kind": "<canonical_kind>",

  "subject": { },
  "time_window": { },

  "level_0_summary": { },
  "level_1_geometry": { },
  "level_2_telemetry": { },

  "token_budget": { },
  "lineage": { }
}

3.1 Required fields

FieldTypeDescription
artifact_typestringAlways "tabular_manifold"
artifact_versionstringSpec version (e.g., "1.1")
manifold_kindenumOne of the canonical kinds (see §4)
subjectobjectWhat this manifold describes
level_0_summaryobjectRequired. The cheap cognitive entry point.

3.2 Optional fields

FieldTypeDescription
time_windowobjectFor time-based manifolds
level_1_geometryobjectAggregated data in columnar format
level_2_telemetryobjectRaw data in columnar format
token_budgetobjectHints for agent token management
lineageobjectProvenance metadata

4. Canonical manifold kinds

TMS defines five canonical manifold kinds. Each has a fixed Level 0 schema with optional extensions.

timeseries_metric

For any metric observed over time (prices, counts, rates).

Level 0 required fields:

{
  "observation_count": 150,
  "time_coverage": {
    "expected_periods": 12,
    "observed_periods": 10,
    "coverage_ratio": 0.833
  },
  "distribution": {
    "min": 9.85,
    "max": 18.40,
    "mean": 12.62,
    "median": 10.90,
    "stddev": 3.44,
    "cv": 0.273
  },
  "reliability": { },
  "quality_flags": { },
  "interpretation_hints": []
}

funnel_conversion

For sequential stage-based processes (sales funnels, onboarding flows).

Level 0 required fields:

{
  "stage_count": 5,
  "total_entered": 10000,
  "total_converted": 342,
  "overall_conversion_rate": 0.0342,
  "bottleneck_stage": "checkout",
  "bottleneck_drop_rate": 0.67,
  "reliability": { },
  "quality_flags": { },
  "interpretation_hints": []
}

cohort_behavior

For tracking groups over time (user cohorts, customer segments).

Level 0 required fields:

{
  "cohort_count": 12,
  "total_subjects": 5000,
  "observation_periods": 6,
  "retention_summary": {
    "period_1": 0.85,
    "period_3": 0.62,
    "period_6": 0.41
  },
  "reliability": { },
  "quality_flags": { },
  "interpretation_hints": []
}

inventory_snapshot

For point-in-time inventory or resource states.

Level 0 required fields:

{
  "snapshot_timestamp": "2025-01-14T00:00:00Z",
  "total_skus": 1500,
  "total_units": 125000,
  "total_value": 2500000.00,
  "stockout_skus": 45,
  "overstock_skus": 120,
  "reliability": { },
  "quality_flags": { },
  "interpretation_hints": []
}

anomaly_detection

For systems monitoring and alerting contexts.

Level 0 required fields:

{
  "detection_window": {
    "start": "2025-01-01T00:00:00Z",
    "end": "2025-01-14T00:00:00Z"
  },
  "anomaly_count": 3,
  "severity_distribution": {
    "critical": 1,
    "warning": 2,
    "info": 0
  },
  "top_anomaly": {
    "timestamp": "2025-01-10T14:30:00Z",
    "metric": "cpu_usage",
    "observed": 98.5,
    "expected_range": [20, 60],
    "severity": "critical"
  },
  "reliability": { },
  "quality_flags": { },
  "interpretation_hints": []
}

5. Reliability block (required in Level 0)

Every Level 0 must include a reliability block that quantifies confidence in the summary statistics.

"reliability": {
  "sample_size_class": "sparse|adequate|robust",
  "sample_size_n": 5,
  "sample_size_threshold_adequate": 30,
  "sample_size_threshold_robust": 100,

  "confidence_in_mean": {
    "level": 0.95,
    "margin_of_error": 2.4,
    "interval": [10.22, 15.02]
  },

  "data_quality_score": 0.85,
  "data_quality_notes": "3% of rows had imputed values",

  "staleness": {
    "last_observation": "2025-11-21T00:00:00Z",
    "days_since_last": 54,
    "is_stale": true,
    "stale_threshold_days": 30
  }
}

Sample size classes

ClassCriteriaImplication
sparsen < 30Summary stats are unstable. Treat with caution.
adequate30 ≤ n < 100Stats are reasonable but not rock-solid.
robustn ≥ 100High confidence in summary statistics.

6. Quality flags (required in Level 0)

Standardized boolean flags that trigger agent attention:

"quality_flags": {
  "low_sample_size": true,
  "missing_periods": true,
  "suspected_outliers": true,
  "data_staleness": true,
  "high_variance": false,
  "imputation_applied": false,
  "schema_drift_detected": false
}
FlagTrigger condition
low_sample_sizereliability.sample_size_class == "sparse"
missing_periodstime_coverage.coverage_ratio < 0.8
suspected_outliersAny value > 3σ from mean, or IQR-based detection
data_stalenessreliability.staleness.is_stale == true
high_variancedistribution.cv > 0.3
imputation_appliedAny values were filled or estimated
schema_drift_detectedColumn types or names changed from baseline

7. Token budget block

Helps agents decide whether to load deeper levels:

"token_budget": {
  "level_0_tokens_approx": 450,
  "level_1_tokens_approx": 1200,
  "level_2_tokens_approx": 8500,

  "level_2_row_count": 517,
  "level_2_inline_row_limit": 50,
  "level_2_inline_strategy": "preview_outliers",

  "compression_ratios": {
    "level_1_vs_level_2": 7.1,
    "level_0_vs_level_2": 18.9
  },

  "recommended_strategy": "Load Level 0 first. If quality_flags has any true values, load Level 1. Only load Level 2 if investigating specific anomalies."
}

Level 2 inline strategies

StrategyBehavior
preview_outliersInline only rows flagged as outliers or anomalies
preview_recentInline only the N most recent rows
preview_sampleInline a random sample of N rows
full_inlineInline all rows (use only if row_count < limit)
noneNo rows inlined. Agent must use retrieval.

8. Level 1 geometry schema

Level 1 uses columnar encoding for aggregated data.

"level_1_geometry": {
  "format": "columnar_json_v1",
  "granularity": "month",

  "schema": {
    "columns": [
      { "name": "period", "type": "string", "description": "Aggregation bucket (YYYY-MM)" },
      { "name": "n", "type": "integer", "description": "Observation count in period" },
      { "name": "min", "type": "double", "description": "Minimum value in period" },
      { "name": "max", "type": "double", "description": "Maximum value in period" },
      { "name": "mean", "type": "double", "description": "Mean value in period" },
      { "name": "median", "type": "double", "description": "Median value in period" },
      { "name": "stddev", "type": "double", "description": "Standard deviation", "nullable": true },
      { "name": "flag", "type": "string", "description": "Optional anomaly flag", "nullable": true }
    ],
    "primary_sort": ["period"]
  },

  "rows": [
    ["2025-01", 15, 9.80, 11.20, 10.45, 10.40, 0.35, null],
    ["2025-02", 12, 10.10, 11.50, 10.80, 10.75, 0.42, null],
    ["2025-07", 3, 17.50, 18.90, 18.20, 18.40, 0.70, "spike_detected"]
  ],

  "missing_periods": ["2025-03", "2025-04", "2025-05", "2025-06"]
}

9. Level 2 telemetry schema

Level 2 contains raw observations. For large datasets, use preview plus retrieval.

"level_2_telemetry": {
  "format": "columnar_json_v1",

  "schema": {
    "columns": [
      { "name": "ts", "type": "timestamp", "description": "Observation timestamp" },
      { "name": "value", "type": "double", "description": "Observed value" },
      { "name": "document_id", "type": "string", "description": "Source document reference", "nullable": true },
      { "name": "notes", "type": "string", "description": "Human or ETL notes", "nullable": true }
    ],
    "primary_sort": ["ts"]
  },

  "row_count_total": 517,

  "inline_rows": {
    "strategy": "preview_outliers",
    "rows": [
      ["2025-07-09T00:00:00Z", 18.40, "INV-12002", "Expedite fee applied"],
      ["2025-07-15T00:00:00Z", 17.90, "INV-12015", "Small lot surcharge"]
    ]
  },

  "retrieval": {
    "method": "mcp_tool",
    "tool_name": "get_timeseries_telemetry",
    "tool_args": {
      "manifold_id": "mfld_abc123",
      "level": 2,
      "filters": {}
    },
    "pagination": {
      "default_page_size": 100,
      "max_page_size": 500
    }
  }
}

When to inline vs retrieve

Row countRecommendation
≤ 50Full inline ("strategy": "full_inline")
51 to 500Preview inline plus retrieval available
> 500Preview inline only. Retrieval required for full data.

10. Interpretation hints

Natural language guidance for agents. Always an array of strings:

"interpretation_hints": [
  "Sparse series: only 5 observations across 12 months. Summary statistics are unreliable.",
  "July 2025 shows a price spike (18.40 vs median 10.90). Investigate Level 2 for evidence.",
  "High coefficient of variation (0.27) suggests inconsistent pricing or mixed product types.",
  "Coverage ratio is 0.42, meaning 58% of expected periods have no data."
]

Guidelines for hint authoring

  • Lead with the most actionable insight
  • Reference specific numbers from the manifold
  • Suggest next steps (e.g., "Investigate Level 2")
  • Keep each hint to one or two sentences

11. Lineage block

Provenance metadata for auditability:

"lineage": {
  "manifold_id": "mfld_abc123",
  "computed_at": "2026-01-14T22:10:00Z",
  "computed_by": "tms_generator_v1.2",
  "computation_duration_ms": 450,

  "inputs": [
    {
      "dataset_id": "purchase_orders_silver",
      "dataset_version": "v5.2",
      "row_count": 125000,
      "as_of_timestamp": "2026-01-14T00:00:00Z"
    }
  ],

  "filters_applied": [
    { "field": "supplier_id", "operator": "eq", "value": "S-789" },
    { "field": "part_id", "operator": "eq", "value": "P-123456" }
  ],

  "transformations": [
    "Converted unit_price from cents to dollars",
    "Excluded cancelled PO lines",
    "Imputed missing facility codes as 'UNKNOWN'"
  ]
}

12. Complete example

A full timeseries_metric manifold:

{
  "artifact_type": "tabular_manifold",
  "artifact_version": "1.1",
  "manifold_kind": "timeseries_metric",

  "subject": {
    "entity_type": "part_supplier_price",
    "part_id": "P-123456",
    "part_number": "WIDGET-SS-075",
    "part_description": "Widget, stainless steel, 3/4 inch",
    "supplier_id": "S-789",
    "supplier_name": "Acme Industrial Supply",
    "metric_name": "unit_price",
    "currency": "USD",
    "unit_of_measure": "EA"
  },

  "time_window": {
    "start": "2025-01-01",
    "end": "2025-12-31",
    "timezone": "UTC",
    "granularity": "month"
  },

  "level_0_summary": {
    "observation_count": 5,
    "time_coverage": {
      "expected_periods": 12,
      "observed_periods": 4,
      "coverage_ratio": 0.333
    },
    "distribution": {
      "min": 9.85,
      "max": 18.40,
      "mean": 12.01,
      "median": 10.70,
      "stddev": 3.44,
      "cv": 0.286
    },
    "reliability": {
      "sample_size_class": "sparse",
      "sample_size_n": 5
    },
    "quality_flags": {
      "low_sample_size": true,
      "missing_periods": true,
      "suspected_outliers": true,
      "data_staleness": true
    },
    "interpretation_hints": [
      "Sparse data: only 5 observations across 12 months.",
      "One outlier detected: $18.40 in July (82% above median).",
      "Recommended: inspect Level 1 to identify spike month."
    ]
  },

  "level_1_geometry": {
    "format": "columnar_json_v1",
    "granularity": "month",
    "schema": {
      "columns": [
        { "name": "period", "type": "string" },
        { "name": "n", "type": "integer" },
        { "name": "mean", "type": "double" },
        { "name": "flag", "type": "string", "nullable": true }
      ]
    },
    "rows": [
      ["2025-01", 1, 10.10, null],
      ["2025-03", 2, 10.80, null],
      ["2025-07", 1, 18.40, "outlier_spike"],
      ["2025-11", 1, 9.85, null]
    ]
  },

  "level_2_telemetry": {
    "format": "columnar_json_v1",
    "row_count_total": 5,
    "inline_rows": {
      "strategy": "full_inline",
      "rows": [
        ["2025-01-14T00:00:00Z", 10.10, "PO-555"],
        ["2025-03-02T00:00:00Z", 10.70, "PO-612"],
        ["2025-03-28T00:00:00Z", 10.90, "PO-640"],
        ["2025-07-09T00:00:00Z", 18.40, "PO-777"],
        ["2025-11-21T00:00:00Z", 9.85, "PO-901"]
      ]
    }
  },

  "lineage": {
    "manifold_id": "mfld_price_P123456_S789_2025",
    "computed_at": "2026-01-14T22:30:00Z",
    "computed_by": "tms_generator_v1.1"
  }
}

13. MCP integration pattern

TMS manifolds are designed to be returned by MCP tools. Recommended pattern:

@mcp_tool
def get_price_manifold(part_id: str, supplier_id: str, year: int) -> dict:
    """
    Returns a TMS manifold for part/supplier price history.

    The manifold includes:
    - Level 0: Summary statistics and quality flags
    - Level 1: Monthly aggregates
    - Level 2: Raw transactions (preview only if >50 rows)

    Start with level_0_summary. Check quality_flags to decide
    whether deeper investigation is needed.
    """
    # ... generate manifold ...
    return manifold

Tool documentation should instruct the agent to:

  1. Always read level_0_summary first
  2. Check quality_flags for any true values
  3. If flags are set, read level_1_geometry to locate the issue
  4. Only load level_2_telemetry when investigating specific anomalies
  5. Use interpretation_hints as reasoning guidance

14. Reference examples

Four polished TMS reference manifolds covering a fictional specialty packaging manufacturer, TechnoFlex Industries. Each is a full L0 + L1 + L2 envelope with rich telemetry, Pareto-truncated rollups, and interpretation hints. Suitable as drop-in templates for procurement, spend-intel, and supply-chain agents.

Note: these examples use manifold_kind: "entity_profile", a draft v1.2 extension to the five canonical kinds above. The envelope, reliability, quality_flags, token_budget, and lineage blocks are v1.1-compatible.

Item profile

TechnoSeal IM-1652 Ionomer Resin

Sole-sourced sealing-layer material, $4.02M annual spend, HHI 1.0. Demonstrates supplier_concentration, low_supplier_diversity flag, and propylene-tracked price discipline.

Supplier profile

Helvian Specialty Polymers GmbH

Multi-commodity strategic vendor, $5.67M annual, 18 SKUs across 6 commodity groups. Shows item_concentration, weighted_avg_item_cv vs global CV, and Pareto-truncated rollups with tail mix.

Commodity profile

Industrial Colorants & Pigments

94.7% concentrated on one supplier, one SKU. Shows high_supplier_concentration and high_item_concentration flags driving an unambiguous Level 0 sourcing decision.

Supplier-category profile

Specialty Chemicals & Additives

$35.6M category, 57 suppliers, 288 SKUs, 35 industries. Demonstrates direct_indirect_mix, industry_rollup, and how a UoM-encoded outlier surfaces in Level 2 telemetry preview.

15. Author manifolds with Claude

create-tms-manifold is a Claude skill that walks Claude through generating a TMS v1.2 entity_profile manifold from a PO-line silver table. Drop it into your ~/.claude/skills/ directory and Claude will use it whenever you ask for a procurement, supplier, commodity, or supplier-category manifold.

The skill covers:

  • The L0 / L1 / L2 envelope and which sub-blocks are required per entity type
  • Pareto truncation rules including tie-breaking
  • Each quality_flag trigger condition and where the flag belongs
  • HHI math at the rollup grain, including the truncated tail
  • A validation checklist Claude runs before returning the JSON
  • Templates for the empty envelope and discipline-rating bands

Skill frontmatter

---
name: create-tms-manifold
description: Build a TMS v1.2 entity_profile manifold from PO-line data. Generates the L0 summary, L1 geometry, L2 telemetry preview, and lineage block in one envelope.
---

Install

mkdir -p ~/.claude/skills/create-tms-manifold
curl -o ~/.claude/skills/create-tms-manifold/SKILL.md \
  https://canonical.agency/skills/create-tms-manifold/SKILL.md

Then prompt Claude with something like “Build a TMS entity_profile manifold for supplier GCP-...” and the skill will guide the work from query construction through validation.

Changelog

v1.1 (2026-01-14)

  • Added required reliability block to Level 0
  • Defined five canonical manifold_kind values with fixed Level 0 schemas
  • Added token_budget block with compression ratios
  • Specified inline_rows.strategy enum for Level 2
  • Removed underspecified drilldown_policy DSL (replaced with interpretation_hints and recommended_strategy)
  • Added outlier_summary to Level 0 for timeseries_metric
  • Clarified inline vs retrieval thresholds

v1.0 (2026-01-10)

  • Initial spec

License

TMS is released under Apache 2.0. Use it freely.