Referencecreate-tms-manifold

Calculations reference

Math the create-tms-manifold Claude skill refers to. Sample size bands, confidence intervals, HHI, Pareto truncation, outlier detection, and more. Self-contained so the SKILL.md body can stay lean.

TMS Specification →View raw .md Download skill tarball ↓

Sample size and reliability

Sample size classes

sample_size_class is a banded label on observation_count (the number of PO lines in the window):

if observation_count < 30:  "sparse"
elif observation_count < 100: "adequate"
else: "robust"

Why the bands matter:

sparse: stats are unstable, CIs are wide, outlier detection is unreliable. Triggers quality_flags.low_sample_size. Use IQR for outlier flagging instead of z-score (see Outlier detection below).
adequate: stats are usable but not rock-solid. Z-score is acceptable but compute the CI against a t-distribution rather than a normal.
robust: Central Limit Theorem applies; z-distribution CI and z-score outlier detection are both fine.

Confidence interval for the mean

Always emit confidence_in_mean at the 95% level. Formula depends on sample size class:

sample stddev s = sqrt( sum( (x_i - mean)**2 ) / (n - 1) )
standard error  = s / sqrt(n)
 
if class == "robust":
    margin_of_error = 1.96 * standard_error
else:
    margin_of_error = t_critical(0.025, df = n - 1) * standard_error
 
interval = [mean - margin_of_error, mean + margin_of_error]

For sparse samples, the t-critical at df = 5 is ~2.57; at df = 29 it is ~2.05. The CI gets visibly wider as n shrinks: that is a feature, not a bug. Use sample stddev (divide by n - 1), not population stddev.

population_size and population_rank

For a manifold of kind K over time_window W:

population       = distinct entities of kind K observed in the
                   source table during window W
population_size  = count(population)
population_rank  = rank of this entity by total_spend_usd within
                   population, descending (1 = largest spender)

Do not extrapolate beyond W. If the window is 24 months, population_size is "active entities in the last 24 months", not "all entities the company has ever transacted with".

pct_of_total_spend

pct_of_total_spend = this_entity.total_spend_usd
                   / sum(e.total_spend_usd for e in population)

The denominator is total spend across the entire population of kind K in the window. It is NOT total company procurement spend across all kinds; that is a different metric and TMS does not cross kinds.

For a supplier_category manifold, denominator = sum across all supplier_categories in the window. For a supplier manifold, denominator = sum across all suppliers in the window. And so on.

Data quality

data_quality_score

data_quality_score = 1 - (imputed_or_null_count / total_row_count)

Range 0 to 1. Anything below 0.90 means a meaningful fraction of rows needed imputation or had missing critical fields. When below 0.90, emit data_quality_notes explaining what was filled and why.

What counts as imputed

An "imputed" value is anything derived rather than directly observed. Common cases:

unit_price computed from spend / qty when the source row has spend but no unit_price.
qty computed from spend / unit_price when the source row has spend and unit_price but no qty.
site_code defaulted (e.g., to "UNKNOWN") when the source row lacks site attribution.
Currency converted via FX (see Currency normalization).
Date defaulted to month-end or quarter-end when the source row carries only a coarse period.

Imputed values count against data_quality_score AND trigger quality_flags.imputation_applied = true. Each imputation rule applied must appear in lineage.transformations so an auditor can reproduce.

Staleness threshold

days_since_last = today - staleness.last_observation
is_stale = days_since_last > stale_threshold_days

Default stale_threshold_days:

90 for routine PO-line procurement data (default).
365 for slow-cadence categories like capital equipment (long replenishment cycles; sparse observations are expected).
30 for real-time-ish flows (e.g., daily transactional categories in CPG or hospitality).

If you override the default, record it in reliability.staleness.stale_threshold_days and call out the choice in data_quality_notes.

Currency normalization

TMS assumes a single reporting currency in the manifold (typically USD). If the source table carries multiple currencies, normalize to one BEFORE building the manifold, not after.

Process:

Pick a reporting currency. Record it on the relevant financial block if you need to be explicit (e.g., currency: "USD" on the subject or financial_summary).
Pick an FX policy:
- Monthly average FX: simple, suitable for spend analytics where intra-month volatility is small.
- End-of-period FX: matches accounting reconciliation cadence.
- Transaction-date FX: most accurate but most expensive to compute. Use when individual lines matter (e.g., commodity_group analyses for hedged commodities).
Document the policy in lineage.transformations (one prose line) AND set quality_flags.imputation_applied = true. FX conversion is imputation.
Never mix currencies in distribution stats, HHI math, or rollups. The numbers come out meaningless: the CV blows up, the HHI bands stop mapping to the FTC scale, and rank ordering is wrong.

Discipline rating bands

discipline_rating is a banded label on weighted_avg_item_cv:

if weighted_avg_item_cv < 0.10: "Excellent"
elif weighted_avg_item_cv < 0.25: "Good"
elif weighted_avg_item_cv < 0.50: "Fair"
else: "Poor"

Always emit a rating_basis object alongside discipline_rating:

"rating_basis": {
  "metric": "weighted_avg_item_cv",
  "threshold_excellent": 0.10,
  "threshold_good": 0.25,
  "threshold_fair": 0.50,
  "note": "Spend-weighted per-item CV. Cross-category comparisons are not meaningful."
}

This lets the consumer re-band against their own tolerance without re-reading the spec.

HHI on a rollup output

Herfindahl-Hirschman Index measures concentration. Compute over all rows including the Pareto-truncated tail:

hhi = sum( (row.spend / total_spend) ** 2 for row in all_rows )
top_supplier_pct = max(row.spend for row in all_rows) / total_spend

Bands (FTC convention, scaled to 0-1):

< 0.15 unconcentrated
0.15 <= hhi <= 0.25 moderately concentrated
> 0.25 highly concentrated (sets high_supplier_concentration flag)
1.0 monopoly (single source)

The tail belongs in the HHI math even when it is summarized away in the rollup. Using only the truncated head will under-state concentration.

Weighted average per-item CV

Isolates pricing volatility from product-mix effects:

items_eligible = [i for i in items if i.po_line_count >= 2]
total_eligible_spend = sum(i.spend for i in items_eligible)
 
weighted_avg_item_cv = sum(
  i.cv * (i.spend / total_eligible_spend)
  for i in items_eligible
)

Single-PO items have no stddev and are excluded. Track the eligibility share separately:

pricing_data_coverage_pct = total_eligible_spend / total_entity_spend

If pricing_data_coverage_pct < 0.80, add a note in price_stability.rating_basis.note explaining the gap. The discipline_rating is only as trustworthy as its coverage.

Pareto truncation algorithm

function pareto_truncate(rows, target_coverage, min_rows, max_rows,
                         rank_metric="spend"):
    sorted_rows = sort(rows, key=rank_metric, descending=True)
    total = sum(r[rank_metric] for r in sorted_rows)
 
    emitted = []
    cumulative = 0.0
    for row in sorted_rows:
        if len(emitted) >= max_rows:
            break
        if (len(emitted) >= min_rows and
            cumulative / total >= target_coverage):
            # tie_extended_by_spend_v2: include this row only if it
            # adds non-trivial incremental spend at the boundary
            if row[rank_metric] / total >= 0.005:
                emitted.append(row)
                cumulative += row[rank_metric]
            break
        emitted.append(row)
        cumulative += row[rank_metric]
 
    tail = sorted_rows[len(emitted):]
    tail_summary = summarize(tail)
 
    return {
        "rows": emitted,
        "rows_truncated": len(tail),
        "tie_break": "tie_extended_by_spend_v2",
        "tail_summary": tail_summary,
    }

Default target_coverage:

supplier_rollup: 0.80
item_rollup: 0.80
industry_rollup: 0.95
sub_category_rollup: 0.95
commodity_rollup: 0.80

Default min_rows = 5, max_rows = 50 (supplier/item) or max_rows = 20 (industry/sub_category).

tail_summary structure

Minimum fields for any tail_summary:

{
  "rows_truncated": 0,
  "spend": 0,
  "pct_of_spend": 0,
  "po_count": 0,
  "po_line_count": 0
}

Type-specific extras:

supplier_rollup tail → add item_count and tail_industry_mix (top 5 industries by tail spend, each {industry, spend, pct_of_tail})
item_rollup tail → add distinct_commodity_groups and tail_commodity_mix (top 5 commodity groups)
commodity_rollup tail → add distinct_sub_categories
industry_rollup tail → add distinct_industries (typically equals rows_truncated since each row is already an industry)

Outlier detection

Used to populate quality_flags.suspected_outliers (boolean) and to flag individual rows in level_2_telemetry.inline_rows.

Z-score (default for adequate and robust samples)

z_score = (row.unit_price - entity.mean) / entity.stddev

Emit the line with flag set when |z_score| >= 3. Common flag values used in the TechnoFlex examples:

"price_outlier_high" for z_score >= 3
"price_outlier_low" for z_score <= -3
"price_peak_period" for 1.5 <= z_score < 3
"uom_likely_misencoded" for z_score >= 8 (typically a per-railcar price mistakenly entered per-unit; verify the spend math to confirm)
"high_volume_lane" for normal-priced rows with qty in the top decile (not an outlier per se, but useful evidence)

IQR fallback (for sparse samples)

When sample_size_class == "sparse" (n < 30), z-score breaks down because stddev is unreliable. Use IQR instead:

Q1 = 25th percentile of unit_price
Q3 = 75th percentile of unit_price
IQR = Q3 - Q1
 
flag row as outlier if:
    row.unit_price < Q1 - 1.5 * IQR     (low)
    or row.unit_price > Q3 + 1.5 * IQR  (high)

Flag vocabulary for IQR-detected outliers:

"iqr_outlier_high" for the upper tail
"iqr_outlier_low" for the lower tail

Treat IQR and z-score as parallel detectors with the same downstream behavior. quality_flags.suspected_outliers is true if EITHER detector fires on any row. Record which detector was used in level_2_telemetry.retrieval notes or in data_quality_notes.

← Back to the TMS specification