Calculations reference
Math the create-tms-manifold Claude skill refers to. Sample size bands, confidence intervals, HHI, Pareto truncation, outlier detection, and more. Self-contained so the SKILL.md body can stay lean.
Sample size and reliability
Sample size classes
sample_size_class is a banded label on observation_count
(the number of PO lines in the window):
if observation_count < 30: "sparse"
elif observation_count < 100: "adequate"
else: "robust"Why the bands matter:
sparse: stats are unstable, CIs are wide, outlier detection is unreliable. Triggersquality_flags.low_sample_size. Use IQR for outlier flagging instead of z-score (see Outlier detection below).adequate: stats are usable but not rock-solid. Z-score is acceptable but compute the CI against a t-distribution rather than a normal.robust: Central Limit Theorem applies; z-distribution CI and z-score outlier detection are both fine.
Confidence interval for the mean
Always emit confidence_in_mean at the 95% level. Formula depends on
sample size class:
sample stddev s = sqrt( sum( (x_i - mean)**2 ) / (n - 1) )
standard error = s / sqrt(n)
if class == "robust":
margin_of_error = 1.96 * standard_error
else:
margin_of_error = t_critical(0.025, df = n - 1) * standard_error
interval = [mean - margin_of_error, mean + margin_of_error]For sparse samples, the t-critical at df = 5 is ~2.57; at df = 29 it
is ~2.05. The CI gets visibly wider as n shrinks: that is a feature,
not a bug. Use sample stddev (divide by n - 1), not population
stddev.
Population and spend share
population_size and population_rank
For a manifold of kind K over time_window W:
population = distinct entities of kind K observed in the
source table during window W
population_size = count(population)
population_rank = rank of this entity by total_spend_usd within
population, descending (1 = largest spender)Do not extrapolate beyond W. If the window is 24 months,
population_size is "active entities in the last 24 months", not
"all entities the company has ever transacted with".
pct_of_total_spend
pct_of_total_spend = this_entity.total_spend_usd
/ sum(e.total_spend_usd for e in population)The denominator is total spend across the entire population of kind K in the window. It is NOT total company procurement spend across all kinds; that is a different metric and TMS does not cross kinds.
For a supplier_category manifold, denominator = sum across all
supplier_categories in the window. For a supplier manifold,
denominator = sum across all suppliers in the window. And so on.
Data quality
data_quality_score
data_quality_score = 1 - (imputed_or_null_count / total_row_count)Range 0 to 1. Anything below 0.90 means a meaningful fraction of rows
needed imputation or had missing critical fields. When below 0.90,
emit data_quality_notes explaining what was filled and why.
What counts as imputed
An "imputed" value is anything derived rather than directly observed. Common cases:
unit_pricecomputed fromspend / qtywhen the source row has spend but no unit_price.qtycomputed fromspend / unit_pricewhen the source row has spend and unit_price but no qty.site_codedefaulted (e.g., to"UNKNOWN") when the source row lacks site attribution.- Currency converted via FX (see Currency normalization).
- Date defaulted to month-end or quarter-end when the source row carries only a coarse period.
Imputed values count against data_quality_score AND trigger
quality_flags.imputation_applied = true. Each imputation rule
applied must appear in lineage.transformations so an auditor can
reproduce.
Staleness threshold
days_since_last = today - staleness.last_observation
is_stale = days_since_last > stale_threshold_daysDefault stale_threshold_days:
90for routine PO-line procurement data (default).365for slow-cadence categories like capital equipment (long replenishment cycles; sparse observations are expected).30for real-time-ish flows (e.g., daily transactional categories in CPG or hospitality).
If you override the default, record it in
reliability.staleness.stale_threshold_days and call out the choice
in data_quality_notes.
Currency normalization
TMS assumes a single reporting currency in the manifold (typically USD). If the source table carries multiple currencies, normalize to one BEFORE building the manifold, not after.
Process:
- Pick a reporting currency. Record it on the relevant financial
block if you need to be explicit (e.g.,
currency: "USD"on the subject or financial_summary). - Pick an FX policy:
- Monthly average FX: simple, suitable for spend analytics where intra-month volatility is small.
- End-of-period FX: matches accounting reconciliation cadence.
- Transaction-date FX: most accurate but most expensive to compute. Use when individual lines matter (e.g., commodity_group analyses for hedged commodities).
- Document the policy in
lineage.transformations(one prose line) AND setquality_flags.imputation_applied = true. FX conversion is imputation. - Never mix currencies in distribution stats, HHI math, or rollups. The numbers come out meaningless: the CV blows up, the HHI bands stop mapping to the FTC scale, and rank ordering is wrong.
Discipline rating bands
discipline_rating is a banded label on weighted_avg_item_cv:
if weighted_avg_item_cv < 0.10: "Excellent"
elif weighted_avg_item_cv < 0.25: "Good"
elif weighted_avg_item_cv < 0.50: "Fair"
else: "Poor"Always emit a rating_basis object alongside discipline_rating:
"rating_basis": {
"metric": "weighted_avg_item_cv",
"threshold_excellent": 0.10,
"threshold_good": 0.25,
"threshold_fair": 0.50,
"note": "Spend-weighted per-item CV. Cross-category comparisons are not meaningful."
}This lets the consumer re-band against their own tolerance without re-reading the spec.
HHI on a rollup output
Herfindahl-Hirschman Index measures concentration. Compute over all rows including the Pareto-truncated tail:
hhi = sum( (row.spend / total_spend) ** 2 for row in all_rows )
top_supplier_pct = max(row.spend for row in all_rows) / total_spendBands (FTC convention, scaled to 0-1):
< 0.15unconcentrated0.15 <= hhi <= 0.25moderately concentrated> 0.25highly concentrated (setshigh_supplier_concentrationflag)1.0monopoly (single source)
The tail belongs in the HHI math even when it is summarized away in the rollup. Using only the truncated head will under-state concentration.
Weighted average per-item CV
Isolates pricing volatility from product-mix effects:
items_eligible = [i for i in items if i.po_line_count >= 2]
total_eligible_spend = sum(i.spend for i in items_eligible)
weighted_avg_item_cv = sum(
i.cv * (i.spend / total_eligible_spend)
for i in items_eligible
)Single-PO items have no stddev and are excluded. Track the eligibility share separately:
pricing_data_coverage_pct = total_eligible_spend / total_entity_spendIf pricing_data_coverage_pct < 0.80, add a note in
price_stability.rating_basis.note explaining the gap. The
discipline_rating is only as trustworthy as its coverage.
Pareto truncation algorithm
function pareto_truncate(rows, target_coverage, min_rows, max_rows,
rank_metric="spend"):
sorted_rows = sort(rows, key=rank_metric, descending=True)
total = sum(r[rank_metric] for r in sorted_rows)
emitted = []
cumulative = 0.0
for row in sorted_rows:
if len(emitted) >= max_rows:
break
if (len(emitted) >= min_rows and
cumulative / total >= target_coverage):
# tie_extended_by_spend_v2: include this row only if it
# adds non-trivial incremental spend at the boundary
if row[rank_metric] / total >= 0.005:
emitted.append(row)
cumulative += row[rank_metric]
break
emitted.append(row)
cumulative += row[rank_metric]
tail = sorted_rows[len(emitted):]
tail_summary = summarize(tail)
return {
"rows": emitted,
"rows_truncated": len(tail),
"tie_break": "tie_extended_by_spend_v2",
"tail_summary": tail_summary,
}Default target_coverage:
supplier_rollup: 0.80item_rollup: 0.80industry_rollup: 0.95sub_category_rollup: 0.95commodity_rollup: 0.80
Default min_rows = 5, max_rows = 50 (supplier/item) or
max_rows = 20 (industry/sub_category).
tail_summary structure
Minimum fields for any tail_summary:
{
"rows_truncated": 0,
"spend": 0,
"pct_of_spend": 0,
"po_count": 0,
"po_line_count": 0
}Type-specific extras:
- supplier_rollup tail → add
item_countandtail_industry_mix(top 5 industries by tail spend, each{industry, spend, pct_of_tail}) - item_rollup tail → add
distinct_commodity_groupsandtail_commodity_mix(top 5 commodity groups) - commodity_rollup tail → add
distinct_sub_categories - industry_rollup tail → add
distinct_industries(typically equalsrows_truncatedsince each row is already an industry)
Outlier detection
Used to populate quality_flags.suspected_outliers (boolean) and to
flag individual rows in level_2_telemetry.inline_rows.
Z-score (default for adequate and robust samples)
z_score = (row.unit_price - entity.mean) / entity.stddevEmit the line with flag set when |z_score| >= 3. Common flag values
used in the TechnoFlex examples:
"price_outlier_high"forz_score >= 3"price_outlier_low"forz_score <= -3"price_peak_period"for1.5 <= z_score < 3"uom_likely_misencoded"forz_score >= 8(typically a per-railcar price mistakenly entered per-unit; verify the spend math to confirm)"high_volume_lane"for normal-priced rows with qty in the top decile (not an outlier per se, but useful evidence)
IQR fallback (for sparse samples)
When sample_size_class == "sparse" (n < 30), z-score breaks down
because stddev is unreliable. Use IQR instead:
Q1 = 25th percentile of unit_price
Q3 = 75th percentile of unit_price
IQR = Q3 - Q1
flag row as outlier if:
row.unit_price < Q1 - 1.5 * IQR (low)
or row.unit_price > Q3 + 1.5 * IQR (high)Flag vocabulary for IQR-detected outliers:
"iqr_outlier_high"for the upper tail"iqr_outlier_low"for the lower tail
Treat IQR and z-score as parallel detectors with the same downstream
behavior. quality_flags.suspected_outliers is true if EITHER
detector fires on any row. Record which detector was used in
level_2_telemetry.retrieval notes or in data_quality_notes.