---
name: create-tms-manifold
description: Build a TMS v1.2 entity_profile manifold from PO-line data. Generates the L0 summary, L1 geometry, L2 telemetry preview, and lineage block in one envelope.
---

# create-tms-manifold

Build a Tabular Manifold Spec (TMS) v1.2 entity_profile manifold for one
of: `item`, `supplier`, `commodity_group`, `supplier_category`. Output is
a single JSON document that an agent can consume progressively (L0 first,
L1 on demand, L2 for forensics).

Full TMS spec: https://canonical.agency/spec/tms
Companion QPS spec (provenance): https://canonical.agency/spec/qps
Reference examples (TechnoFlex): https://canonical.agency/spec/tms#reference-examples

## When to use this skill

Use this skill when the user wants to ship a manifold for procurement,
spend-intelligence, or supplier-risk workflows and the source is a
transactional PO-line table (silver-layer, one row per PO line) plus
optional supplier-consolidation and item-categorization joins.

## What you produce

A single JSON object with these top-level keys, in this order:

```json
{
  "artifact_type": "tabular_manifold",
  "artifact_version": "1.2",
  "manifold_kind": "entity_profile",
  "subject": { },
  "time_window": { },
  "level_0_summary": { },
  "level_1_geometry": { },
  "level_2_telemetry": { },
  "token_budget": { },
  "lineage": { }
}
```

L0 must include `reliability` and `quality_flags`. L1 is optional but
recommended for time-windowed entities. L2 must declare a `strategy` and
a `retrieval` block even when the preview is empty.

## Inputs you need from the user

Ask up front, in one round-trip if possible:

1. **Entity type and ID.** One of `item` (ca_item_number), `supplier`
   (parent_supplier_id), `commodity_group` (commodity_group_label),
   `supplier_category` (sub_commodity_label).
2. **Source table** and dialect (Databricks SQL / Snowflake / DuckDB /
   Postgres). At minimum, a PO-line silver table; ideally also the
   supplier-mapping and item-categorization views.
3. **Time window.** Default to the last 24 months ending on the most
   recent observation.
4. **Granularity** for L1. Default to `month` for windows ≥ 6 months,
   `day` for shorter windows.

If anything is missing and you can infer a sensible default, state the
default and proceed. Do not stall on optional parameters.

## Process

### 1. Confirm scope and pull headline numbers

Run a single aggregate query to confirm the entity exists in the window
and to size the work:

```sql
SELECT
  COUNT(*)                                AS po_line_count,
  COUNT(DISTINCT po_num)                  AS po_count,
  MIN(purchase_order_date)                AS first_purchase,
  MAX(purchase_order_date)                AS last_purchase,
  SUM(spend)                              AS total_spend_usd
FROM <table>
WHERE <entity_predicate>
  AND purchase_order_date BETWEEN :start AND :end;
```

If `po_line_count < 30`, mark `sample_size_class: "sparse"` and warn the
user that L0 stats will be unreliable. Do not refuse; emit the manifold
anyway with the flag set.

### 2. Build `level_0_summary`

Required sub-blocks:

- **`observation_count`** = po_line_count from step 1.
- **`time_coverage`** = first/last purchase + window_days.
- **`distribution`** = min, max, mean, median, stddev, cv on `unit_price`.
- **`reliability`** = sample_size_class, confidence_in_mean (95%),
  data_quality_score (1 minus the share of rows with imputed or null
  unit_price), staleness block.
- **`quality_flags`** = booleans. Trigger rules:
  - `low_sample_size`: sample_size_class == "sparse"
  - `data_staleness`: staleness.is_stale == true
  - `high_variance`: distribution.cv > 0.3 (entity-level only; skip for
    supplier and supplier_category where the global CV is mechanically
    large)
  - `suspected_outliers`: any row with |z_score| > 3
  - `low_supplier_diversity`: (item only) supplier_count == 1
  - `high_supplier_concentration`: HHI > 0.25
  - `high_item_concentration`: top_item_pct > 0.5
  - `level_1_*_truncated`: set when the Pareto truncation in §3 dropped
    rows
- **`financial_summary`** = total_spend_usd, total_qty_received, po_count,
  po_line_count, avg_spend_per_po_line, pct_of_total_spend,
  population_rank, population_size.
- **Concentration blocks** (per entity type):
  - item → `supplier_concentration` (HHI, top_supplier_pct, top_3)
  - supplier → `item_concentration` + `commodity_concentration`
  - commodity_group → `supplier_concentration` + `item_diversity` +
    `sub_category_breakdown`
  - supplier_category → `supplier_concentration` + `item_diversity` +
    `industry_breakdown` + `direct_indirect_mix`
- **`price_stability`** = global cv, weighted_avg_item_cv (spend-weighted
  per-item CV over items with ≥2 PO lines), discipline_rating from the
  bands `{Excellent: <0.10, Good: <0.25, Fair: <0.50, Poor: ≥0.50}`, and
  a `rating_basis` object documenting thresholds and method.
- **`relationship`** = first/last purchase, relationship_days,
  avg_monthly_po_frequency.
- **`interpretation_hints`** = 3-4 sentences naming the dominant signal,
  one quantitative anchor each. Lead with the actionable finding, not
  the metric definition.

### 3. Build `level_1_geometry`

`monthly_timeseries`: one row per month in the window. Columns:
`period`, `n`, `spend`, `qty`, `mean_unit_price`, optional
`stddev_unit_price`. Include `missing_periods: []` if the window is
contiguous; otherwise list the gaps.

Then add one or two rollups (per entity type):

- item → `supplier_rollup`
- supplier → `item_rollup` + `commodity_rollup`
- commodity_group → `supplier_rollup` + `item_rollup` + `sub_category_rollup`
- supplier_category → `supplier_rollup` + `item_rollup` + `industry_rollup`

Each rollup uses **Pareto truncation**:

```
Sort rows DESC by spend.
Emit rows until cumulative pct_of_spend >= target_coverage,
  with rows_emitted bounded by [min_rows, max_rows].
For ties at the boundary, use tie_break = "tie_extended_by_spend_v2"
  (include the tied rows if they add unique spend).
Summarize remainder as tail_summary { rows_truncated, spend,
  pct_of_spend, po_count, po_line_count, plus type-specific extras }.
```

Coverage targets: 0.80 for supplier/item rollups, 0.95 for
industry/sub_category rollups (long-tail signal matters more there).

### 4. Build `level_2_telemetry`

Set `row_count_total` to the total PO lines in the window. Choose a
`strategy`:

- `preview_outliers` if any |z_score| > 2 rows exist (recommended default)
- `preview_recent` for stable entities with no outliers
- `preview_concentration_evidence` for high-concentration commodities
- `full_inline` only if row_count_total < 50
- `none` if downstream agents will always retrieve

Cap inline_rows at the limit you'll record in `token_budget`. Each row
should include: rn, po_num, date, site, supplier_name, ca_item_number
(or part_description), qty, qty_uom, unit_price, spend, z_score, flag.

Add a `retrieval` block. For agents, prefer `method: "mcp_tool"` with
tool_name and tool_args. For human auditors, also include
`sql_fallback` with the verbatim query_template and query_params used
to populate the inline preview.

### 5. Write the `token_budget` block

Estimate L0 / L1 / L2 token costs (counts of UTF-8 chars / 4 is a usable
approximation). Always emit `compression_ratio_l1_vs_l2` and
`compression_ratio_l0_vs_l2`. Write a one-sentence
`recommended_strategy` describing what an agent should read first.

### 6. Write the `lineage` block

`manifold_id` format: `mfld_<kind>_<entity_id>_<YYYYMMDD>`.
Include `qps_entry_id` (a parallel ID for the Query Provenance Store
companion spec). List each input dataset with `dataset_id`, `version`
(or as_of_timestamp), `row_count`, and the `as_of_timestamp` used.
List `filters_applied` (each as `{field, operator, value}`) and
`transformations` (one prose line per transformation). Compute a
`checksum` over the sorted, serialized telemetry rows
(`method: "sha256_row_hash"`).

## Validation checklist

Before returning the manifold, verify:

- [ ] `artifact_type == "tabular_manifold"` and `artifact_version` set.
- [ ] `manifold_kind == "entity_profile"`.
- [ ] `level_0_summary.reliability` and `level_0_summary.quality_flags`
      both present.
- [ ] HHI math: sum of `(spend_i / total_spend)^2` across the concentration
      grain. Sanity: HHI ≤ 1.0.
- [ ] All top_3 / top_5 rows are sorted DESC by spend.
- [ ] Percentages in any rollup (`pct_of_spend`) plus the tail_summary
      percentage sum to ~1.0 (allow rounding).
- [ ] Every monthly_timeseries period falls inside `time_window`.
- [ ] `level_2_telemetry.strategy` is one of the enum values.
- [ ] `lineage.inputs[]` is non-empty.
- [ ] `quality_flags.level_1_*_truncated` matches whether any rollup
      actually truncated.
- [ ] No null values in required fields. Use empty strings or 0 only
      where the spec allows.

## Templates

### Empty envelope skeleton

```json
{
  "artifact_type": "tabular_manifold",
  "artifact_version": "1.2",
  "manifold_kind": "entity_profile",
  "subject": {
    "entity_type": "<ca_item|supplier|commodity_group|supplier_category>"
  },
  "time_window": {
    "start": "YYYY-MM-DD",
    "end": "YYYY-MM-DD",
    "granularity": "month"
  },
  "level_0_summary": {
    "observation_count": 0,
    "time_coverage": { },
    "distribution": { },
    "reliability": { },
    "quality_flags": { },
    "financial_summary": { },
    "price_stability": { },
    "relationship": { },
    "interpretation_hints": []
  },
  "level_1_geometry": {
    "monthly_timeseries": { "format": "row_objects_v1", "granularity": "month", "rows": [] }
  },
  "level_2_telemetry": {
    "row_count_total": 0,
    "strategy": "preview_outliers",
    "inline_rows": { "format": "row_objects_v1", "rows": [] },
    "retrieval": { "method": "mcp_tool", "tool_name": "", "tool_args": { } }
  },
  "token_budget": {
    "level_0_tokens_approx": 0,
    "level_1_tokens_approx": 0,
    "level_2_inline_tokens_approx": 0,
    "row_count_total": 0,
    "inline_row_limit": 10,
    "compression_ratio_l1_vs_l2": 0,
    "compression_ratio_l0_vs_l2": 0,
    "recommended_strategy": ""
  },
  "lineage": {
    "manifold_id": "",
    "computed_at": "",
    "computed_by": "",
    "qps_entry_id": "",
    "inputs": [],
    "filters_applied": [],
    "transformations": [],
    "checksum": { "method": "sha256_row_hash", "value": "" }
  }
}
```

### Discipline-rating band logic

```text
if weighted_avg_item_cv < 0.10: "Excellent"
elif weighted_avg_item_cv < 0.25: "Good"
elif weighted_avg_item_cv < 0.50: "Fair"
else: "Poor"
```

### HHI on rollup output

```text
hhi = sum( (row.spend / total_spend) ** 2 for row in all_rows_including_tail )
top_supplier_pct = max(row.spend for row in all_rows) / total_spend
```

Use ALL rows for HHI, not just the truncated head. The tail belongs in
the math even when it is summarized away in the rollup.

## What not to do

- Do not collapse the global CV and the weighted_avg_item_cv into a
  single number. They measure different things and the user will read
  them differently. Always emit both with `rating_basis.note` explaining
  the distinction.
- Do not emit `quality_flags.high_variance` for supplier and
  supplier_category manifolds based on the global CV. Their product mix
  makes the global metric mechanically large; use the weighted per-item
  metric instead, or omit the flag.
- Do not invent values for fields the source can't support. Set the
  relevant `quality_flag` and let the agent decide.
- Do not interpolate parameters into the SQL `query_template`. Use
  parameterized templates with `:name` placeholders and provide
  `query_params` separately. The QPS spec rejects interpolated SQL on
  purpose.
- Do not include PII in `level_2_telemetry.inline_rows`. The preview
  is meant for outlier evidence, not full record dumps.

## Pointers

- The four TechnoFlex reference manifolds at
  https://canonical.agency/spec/tms#reference-examples are the
  authoritative shape examples. When in doubt, match their structure.
- The glossary on each viewer page (e.g.
  https://canonical.agency/manifolds/technoflex/supplier ) explains
  fields like HHI, weighted_avg_item_cv, and Pareto tie-breaking in
  plain language.
- The companion QPS spec (https://canonical.agency/spec/qps ) covers
  the lineage / replay side. If the consumer will need to verify drift
  later, write the QPS entry at the same time as the manifold.
