I Gave Claude a Procurement Data Layer. It Produced a Sourcing Opportunity Review in Five Minutes.

A practical test of TMS. Loaded entity_profile manifolds into local DuckDB, exposed them through MCP, and asked Claude to review $371M of resin spend. The output was a structured opportunity review, not a generic summary.

Jun 6, 2026·10 min read

Five minutes. Six sub-categories. About $371M of resin spend. One diagnostic opportunity review you could put in front of a sourcing leader. The work happened on a laptop, in DuckDB, with Claude reading TMS manifolds through an MCP server.

This is not a chat-with-your-database demo. It is a test of what happens when you give Claude data shaped for agent reasoning, instead of data shaped for dashboards or vector search.

The real problem

Most AI-over-enterprise-data demos pick one of two strategies, and both fail at the same wall.

The first is chat-with-your-database. The model generates SQL, runs it, returns rows. Useful for one-off questions. Brittle for analysis. The model has no context for what the numbers mean, no framework for what counts as opportunity versus noise, and no memory across queries.

The second is full RAG. Embed every document, chunk every table, hope the model can reconstruct business meaning from fragments. This works for retrieval. It does not work for reasoning. A model that can find a row cannot, from that row alone, tell you whether the spend on that row is consolidatable, whether the supplier is single-sourced, whether the price discipline is unusual, or whether the category is a fix-it-first problem or a competitive-bid problem.

The real question a sourcing analyst asks is not "show me every PO line for resin." It is closer to: where is the opportunity, what evidence supports it, what are the risks, and what should we do next.

That question requires compression, hierarchy, lineage, and judgment. None of which a row dump or a chunk index gives you.

The experiment

I built a set of TMS entity_profile manifolds, loaded them into a local DuckDB database, and exposed them to Claude through an MCP server.

The architecture is mundane on purpose:

Source procurement tables
        |
        v
TMS manifold generation
        |
        v
Local DuckDB
        |
        v
MCP server
        |
        v
Claude
        |
        v
Diagnostic opportunity review

The source data represents a fictional specialty packaging company, TechnoFlex Industries. The setup mirrors patterns I have implemented in production at two enterprises under NDA. The structural findings here are public-safe analogues of that work, with supplier names and identifiers invented and all percentages, HHI bands, and price-discipline diagnoses preserved.

The task was deliberately practical: review the resin-related supplier categories and produce a diagnostic opportunity review suitable for a sourcing leader. Six sub-categories matched on the string "resin" in the supplier-category taxonomy. About $371M of addressable spend over 33 months. Claude pulled the manifolds it needed, navigated L0 summaries and L1 rollups, and surfaced a small number of representative L2 telemetry lines for evidence. Total wall-clock time was about five minutes. Token usage was modest.

The point was not to write more SQL. The point was to test whether structured operational memory closes the gap between "retrieve data" and "produce analysis."

What Claude produced

A prioritized opportunity matrix. Six rows, six columns: spend, supplier count, top-supplier share, HHI, price discipline, sizing estimate.

Rank	Sub-category	Annual Spend	HHI	Price Discipline	Sizing
1	PVC Additives and Resins	$38.5M/yr	0.30	Excellent	$4 to $9M (10 to 22%)
2	Petrochemical Polymer Resins	$47M/yr	0.19	Excellent	$4 to $7M (8 to 15%)
3	Plastic Resins and Compounds	$6.7M/yr	0.49	Poor	$1.2 to $2.0M (18 to 30%)
4	Styrenic and Thermoplastic Resins	$17.7M/yr	0.43	Excellent	$1.5 to $4M (8 to 15%)
5	Plastics Resins and Products	$11.7M/yr	0.19	Good	$1.5 to $3M (12 to 20%)
6	Plastic Resins and Chemical Additives	$2.2M/yr	0.23	Excellent	$0.2 to $0.5M (8 to 15%)

Total: $123.8M/year, $12 to $25M annual opportunity (10 to 20% of addressable).

The first thing to notice is that this is not a generic summary. It is a structured sourcing work product. Each row carries its own hypothesis about what kind of opportunity it represents.

PVC Additives and Resins is the cleanest case. Top 3 suppliers cover 88% of spend, two of them (Vexlar Vinnolit and Shintekka) are technically credible alternates to each other on the same grades, and the manifold's weighted item CV of 0.05 suggests flat pricing rather than the indexed pass-through that VCM/EDC-driven PVC should have. The opportunity is a competitive bid that explicitly indexes feedstock and assigns volume between two credible incumbents. The path is short and the political risk is low.

Petrochemical Polymer Resins is the largest absolute prize at $47M/year, but technically the hardest. Grade qualification (food contact, gauge, MFI, color) is real switching cost. The hypothesis is structural: dual-parent legacy fragmentation (Voltrane SA and Voltrane USA reading as effectively two parents), already-qualified Mexican PP suppliers (Polimex and PoliCaribe) opening NAFTA freight options, and a unit-price escalation from $0.55/lb to $1.30/lb over the window without index protection. Sized at $11 to $17M annual, weighted half toward formal indexing and half toward consolidation.

The most diagnostic-rich finding sat in third place by size and first by interest. Plastic Resins and Compounds is a $20M category that the model flagged as a fix-it-first problem rather than a competitive-bid problem. One supplier (Cipres Polymers) holds 69% of category spend. The price discipline is "Poor", with weighted item CV at 1.32 versus less than 0.10 across the other Excellent-rated categories. And at the line-item level, three of the top five items by spend are labeled "Unknown Part" or "Unknown" or "Unspecified Parts and Materials", combined $13.6M of spend the PO data simply does not describe.

The opportunity here is not savings. It is controlling a category where the buyer cannot articulate what they are paying for. The recommended approach is data cleanup against the dominant supplier's invoices, qualification of a second source (a smaller incumbent, Mexvera Specialty Resins, already exists in the data at 4.6%, proving the grade is qualifiable), and an indexed pricing baseline.

That distinction matters. A dashboard might surface the 69% concentration as a risk number. It would not tell you that the right move is data hygiene before competitive bid. The structural framing of the manifold (concentration plus price discipline plus item-level visibility plus quality flags) made the diagnosis possible.

Why structure beat raw access

Three things gave Claude traction.

First, the L0 summary surfaced the decision-relevant numbers at the top of each manifold. Concentration, HHI, price discipline, supplier count, primary commodity group, observation count, time coverage. Claude could read six manifolds and have the comparative shape of the resin family in front of it in a few hundred tokens.

Second, the L1 geometry preserved the structural detail that L0 compressed away. Pareto-truncated rollups by supplier and by item, monthly timeseries that showed when the PP/PE price escalation happened, item-level CVs that distinguished a healthy category from a "Poor" one. When Claude needed evidence for a hypothesis, the geometry was already shaped for the question.

Third, the L2 telemetry preview gave just enough raw rows to confirm or contradict an L1 reading. When the Equispar mean unit price came back as -$14.60 in April 2025, Claude could pull the inline preview and see one line at -$2,116.61, recognize a credit or return, and flag it correctly as a data hygiene issue rather than treating it as a real price.

A flat RAG approach would have lost most of this. The L0 summaries are not embeddable as documents. The structural relationships between L0 and L1 (this is the same entity, here is its progressive disclosure) are not preserved by chunking. The agent ends up reasoning across fragments without the framework that made the fragments mean something.

A raw SQL approach would have lost the time. Claude would have had to discover the analytical shape of the data one query at a time, refining as it went. That works for one or two questions. It does not produce a six-category comparative diagnostic in five minutes.

TMS was the middle layer. Compact enough that Claude could load the full picture cheaply, structured enough that the comparative analysis emerged naturally from the shape of the data, and detailed enough that L2 evidence was a tool call away when Claude needed it.

The signal that mattered most

The model did not pretend the data was perfect.

It flagged five critical data gaps before recommending Gate 2 entry. The absence of any visible commodity indexing in the contract data, despite an obviously feedstock-driven category. The negative unit-price outliers in two sub-categories indicating credits and returns mixed with spend. The six adjacent polymer categories at ~$150M of spend that might contain resin volume mis-categorized away from the "resin" string. The unverified tail of ~85 smaller suppliers where additional name-variant consolidation may still be possible. The absence of spend-by-plant and spend-by-region breakdowns required for Gate 2 site-level analysis.

This is exactly what production-grade agentic analysis should look like. Not a confident answer. A bounded answer, with the boundaries named.

One of those gaps came out of TMS quality_flags directly (the truncation flags surfacing the unverified supplier tail). One came out of the L2 telemetry preview, where the negative-price rows surfaced naturally and the model recognized them as credits and returns rather than real prices. The other three came out of the model reasoning about what the manifolds did not contain, given what kinds of decisions a sourcing leader would need to make. All five are written in the language of action, not the language of caveat. The next sentence in each says what to do about the gap.

A sourcing leader receiving this can trust the work because the work tells them what it does not know.

Where QPS fits

TMS gives the agent the manifold. QPS gives the system the provenance.

For an analysis like this to be production-grade, a sourcing leader needs to know what query generated the manifold, when it was generated, which source tables fed it, whether the underlying data has drifted since, whether the analysis can be replayed against fresh data, and whether the snapshot is fresh enough to drive a decision today.

That is the role of the Query Provenance Store. Every manifold carries a qps_entry_id in its lineage block. The QPS entry records the parameterized query, the source dataset version, a checksum at generation, and an append-only execution log of every replay attempt. If the procurement team rebuilds the manifold a month from now and the numbers shift, QPS catches the drift and quantifies it.

TMS helps the agent reason. QPS helps the organization trust and govern the reasoning. Both are open specifications, both are deployed in production, and the reference examples are public-safe.

The practical claim

I am not claiming that this replaces sourcing analysts. The opportunity review Claude produced is a first-draft diagnostic, not a final recommendation. The stakeholder interviews still need to happen. The contract structure still needs to be confirmed. The category-owner conversations still need to happen.

The claim is narrower and more useful:

If enterprise data is shaped into structured, multi-resolution manifolds with lineage and uncertainty preserved, LLMs can produce useful first-pass operational analysis quickly and cheaply.

That is a much more realistic claim than "chat with your data." It is also a much more practical one, because it points at a specific kind of infrastructure work the organization can actually do. Generate the manifolds. Store them somewhere queryable. Expose them through MCP. The agent layer falls into place from there.

The future of enterprise AI may not be agents wandering through databases. It may be agents reasoning over carefully constructed operational memory.

In this test, it worked.

The full anonymized diagnostic, with the Stakeholder Interview Plan, Gate 1 Readiness Check, and all six categories laid out at depth, lives at Resin Category Opportunity Review. The structural findings (HHI, concentration, price discipline, sizing) are preserved verbatim; the supplier names and identifiers are invented.