For procurement leaders

You're buying the same part thirteen different ways.

Your spend cube doesn't tie out. Your supplier master has fourteen versions of Honeywell. Your category tree is a spreadsheet one person maintains. The big firms charge seven figures and twelve months to fix this. I ship the same diagnostics in weeks, on a canonical data layer that stays in your environment and gets better the more your team uses it.

You probably recognize a few of these

The diagnostics keep coming back, because the data underneath never gets fixed.

  • ·Your spend cube doesn't tie out to AP, and nobody can fully explain why.
  • ·Your supplier master has fourteen versions of the same parent, and your AP team merges them from muscle memory.
  • ·Sourcing events use category mappings that are six months stale.
  • ·The same item is bought from four different suppliers at four different prices, and nobody knows they're the same item.
  • ·Tail-spend reduction has been a slide in every QBR for three years.
  • ·Your analytics tool gives different answers depending on who runs the query.
  • ·The AI sourcing assistant you piloted returns plausible answers that reference the wrong supplier.

Why these are hard

These aren't separate problems. They're symptoms of the same one.

Your operational data doesn't have a canonical version of itself. Supplier records exist in five systems with five different identities. The same physical part is described thirteen different ways across the catalog. Categories drift because the taxonomy isn't owned anywhere. Every analytics surface re-derives the truth, slightly differently.

The big firms attack this with people. Twelve weeks of analysts mapping suppliers in Excel. A taxonomy redesign that takes six months and ships as a deck. A spend cube rebuilt every time someone asks for it. The findings are real. The infrastructure that produced them goes home with the consultants.

The work is the same. The output is different. I ship a canonical entity layer that lives in your data platform, surfaces the same diagnostics, and stays after the engagement ends.

Sourcing Graph

A canonical layer for the entities your procurement team already cares about.

Sourcing Graph is the underlying system. Suppliers, items, and categories resolved to canonical identities. Spend, PO lines, and contracts attached to those identities. A similarity index that finds the parts you're buying as if they were different things. A classification engine that builds and maintains a category tree instead of leaving it to a spreadsheet.

It deploys into your Snowflake or Databricks environment. Your data, your platform, your governance. It's a layer on top of the systems you already run, not a replacement for them.

  • 01

    Supplier consolidation

    Multi-pass canonicalization across ERPs, AP, and sourcing systems. Fourteen variants of one parent collapsed to one canonical entity, with the parent-child hierarchy preserved.

  • 02

    Item similarity index

    Surfaces the parts you're buying as if they were different things. Spec harmonization, cross-supplier price comparison, and SKU rationalization run against the same index.

  • 03

    Classification engine

    Builds and maintains a category tree from your item descriptions. Re-classify new items against the trained taxonomy without a six-month rebuild every time the catalog grows.

  • 04

    Spend layer

    PO lines, AP, and contracts attached to the canonical entities. The spend cube ties out because it's built on entities that have one identity, not five.

What you can run on it

The diagnostics the big firms charge for, shipped on infrastructure that stays.

  • Spend cube and spend transparency

    Spend sliced by supplier, category, business unit, site, month, and item, on canonical entities that actually tie out across systems.

  • Tail-spend diagnostic

    Where the long tail is, what's automatable, what's hidden strategic spend, and what's just supplier sprawl.

  • Supplier fragmentation and concentration

    Where similar spend is unnecessarily split across too many suppliers, and where risk is dangerously concentrated in too few.

  • Price variance and outlier diagnostics

    The same item bought at meaningfully different prices, surfaced by the similarity index rather than by analysts hand-matching SKUs.

  • Frozen-price and stale-price detection

    Prices that haven't moved in years on items where the market has. Often the cleanest savings story in the deck.

  • Taxonomy and master-data diagnostic

    How trustworthy the category structure actually is, where reclassification would move spend, and what to fix before the next sourcing wave.

  • Supplier rationalization sizing

    Where consolidation has real leverage, sized against canonical spend on canonical suppliers, not approximations.

Extends to Kraljic positioning, should-cost benchmarking, supplier scorecards, and category strategy workbenches as the canonical layer matures.

How it deploys

In your environment, on your platform, against your data.

Sourcing Graph deploys into Snowflake or Databricks. Your data never leaves your platform. The canonical entity layer, the similarity index, the classification engine, and the spend layer all run as managed assets inside your warehouse or lakehouse, governed by the controls you already have.

The system surfaces over your existing BI tools (Tableau, Power BI, Sigma, Hex) and can be exposed to agent clients over MCP for teams that want to query it conversationally.

What this looks like in practice

A mid-market industrial distributor, 66,000-line catalog.

The team had three sourcing analysts, a Databricks environment, and a spend cube that had been “almost ready” for two years. Supplier master had thousands of duplicate entities across ERP and AP. Item catalog had no working classification. Categories were a spreadsheet last updated when the procurement director before this one was still at the company.

Sourcing Graph was deployed inside their Databricks environment. The supplier master collapsed to canonical entities with parent- child hierarchy preserved. The item catalog ran through the classification engine and the similarity index, surfacing the duplicate purchasing that the spend cube had been hiding. The spend layer tied out to AP for the first time.

What the team had at the end wasn't a slide deck. It was a running system. The categorization gets better as analysts curate it. The similarity index surfaces new duplicates as the catalog grows. The next sourcing wave runs against canonical data instead of starting with two weeks of reconciliation.

Start here

Spend Diagnostic. Two to four weeks. Fixed fee.

A scoped engagement against your actual data. Sourcing Graph deployed in your environment as a working pilot. A written diagnostic surfacing your top opportunities across supplier fragmentation, item duplication, price variance, frozen prices, and tail spend. At the end, you keep the canonical layer.

Useful even if we don't go further. The diagnostic gives you a defensible savings number, a running entity layer your team can keep building on, and a clear read on what a larger program would look like.

For the broader practice behind this work, see Canonical.