← Essays

The Backward Index: How 1930s Lexicographers Built Vector Search with Index Cards

Long before vector databases, Merriam-Webster lexicographers built a 315,000-card index of words spelled backwards. The same insight powers modern AI: different organizational schemes reveal different patterns in the same data.

·8 min read

I first heard this story on 99% Invisible, Roman Mars's masterful podcast about the design details that shape our world. In a 2018 mini-story, senior editor Delaney Hall introduced listeners to Peter Sokolowski, a lexicographer at Merriam-Webster who discovered something remarkable while wandering through his office's endless rows of steel filing cabinets.

October 1, 2015: somewhere in Dublin, Ohio, a small crowd gathers in a basement workroom to witness the end of an era. The Online Computer Library Center (OCLC) is about to print its final library catalog card, closing the book on more than 150 years of analog information organization.

Meanwhile, in Springfield, Massachusetts, lexicographer Peter Sokolowski wanders through endless rows of steel filing cabinets at Merriam-Webster's offices. He stops at one cabinet labeled simply "Backward Index" and opens it to find 315,000 index cards with dictionary words spelled backwards. At first confused, then amazed, he realizes he's looking at one of the most ingenious pre-digital search innovations ever created.

These two moments, one an ending, one a discovery, tell the same story: we've been solving the information retrieval problem for a lot longer than we think.

The Crisis of the 1930s

Before we had computers, we had a fundamental constraint: physical organization equals search capability. If you wanted to find information, it had to be physically arranged in a way that made that search possible. One arrangement. One search method.

This created what I think of as the "multiple views problem." Imagine you have a dataset of every word in the English language. You can sort it alphabetically, which is great for looking up definitions. But what if you want to find all words ending in "-ology"? Or all compound words using "like"? Or words that rhyme?

In 1930s SQL terms, you'd need:

-- This works fine
SELECT * FROM dictionary WHERE word LIKE 'eco%'
 
-- But this is impossible with alphabetical organization
SELECT * FROM dictionary WHERE word LIKE '%ology'

The alphabetical index was optimized for prefix searches, but completely useless for suffix searches. Sound familiar? It's the same challenge we face today when we need to search high-dimensional data spaces, different queries require different organizational schemes.

Enter Philip Gove: The Proto-Digital Mind

Philip Gove wasn't thinking about computers when he systematized Webster's Third New International Dictionary in the 1950s, but he might as well have been. His approach to lexicography was so systematic, so rule-based, that when dictionaries finally went digital decades later, developers found that "we recognized how very regular the apparatus was."

Consider Gove's "single-statement rule": every definition had to consist of exactly one phrase, starting with a general category (genus) followed by distinguishing features (differentiae). No exceptions. No artistic flourishes. Pure structured data.

This wasn't just pedantry, it was computational thinking before computation. Gove believed that "lexicography should have no traffic with guesswork, prejudice, or bias." He created comprehensive rules for everything: What does a boldface colon mean? When can you use commas? There was a rule for everything, applied consistently across hundreds of thousands of entries.

His colleagues called him systematic to a fault. Critics attacked him for being too rigid, too mechanical. They had no idea he was building the foundation for digital dictionary systems that wouldn't emerge for another thirty years.

The Backward Index: Manual Feature Engineering

Now back to that mysterious filing cabinet. The backward index was exactly what it sounds like: every headword in the dictionary, typed backwards, then filed alphabetically. So "ecology" became "ygoloce" and filed under Y.

Why would anyone do this? Because it solved the suffix search problem elegantly:

  • Want all "-ology" words? Look up "ygolo"
  • All "-like" compounds? Check "ekil"
  • All words ending in "-ment"? Find "tnem"

It was manual feature engineering at massive scale. They were extracting latent patterns from text that were invisible in standard alphabetical order, patterns that revealed morphological families, compound structures, and rhyming sequences.

This index was instrumental in creating Merriam-Webster's first rhyming dictionary. Imagine trying to compile that manually from an alphabetical dictionary. You'd have to read through every single entry. But with the backward index, all the rhyming words were already clustered together.

The insight: different organizational schemes reveal different patterns in the same data. This is vector embeddings in a nutshell.

The Hidden Computer Science DNA

What strikes me most about the backward index isn't just that it worked, it's how sophisticated the underlying thinking was. The lexicographers were implementing concepts we now recognize as fundamental to modern information retrieval:

Multi-dimensional indexing. They maintained multiple organizational schemes simultaneously (alphabetical, backward, subject-based). Each index was essentially a different "projection" of the same data space.

Semantic clustering. Related items naturally clustered together. All the "-ology" words. All the compounds with "pony" (Highland pony, Shetland pony, Welsh pony). All morphologically related terms.

Pattern recognition at scale. They manually identified linguistic relationships across hundreds of thousands of words, work that now requires sophisticated algorithms.

This wasn't happening in isolation. In 1924, the Mundaneum in Belgium created what might be history's first search engine: 18 million index cards that functioned as an "analog search service." Submit a query on any topic for a fee, and they'd send back relevant information copied from their massive bibliographic catalog.

Modern Echoes

When we build vector embedding systems today, we're solving fundamentally the same problems:

Then. How do we organize 315,000 words so we can find semantic relationships invisible in alphabetical order?

Now. How do we organize millions of documents in high-dimensional space so we can find semantic relationships invisible in keyword search?

The core insight remains unchanged:

Similarity should mean proximity.

In the backward index, all words ending in "-ology" appeared near each other. In vector space, semantically similar documents cluster together based on learned representations.

The difference is scale and automation. Where 1930s lexicographers manually identified morphological patterns, we use neural networks to discover latent semantic structures. Where they created a single-purpose filing system, we build general-purpose embedding models.

But the fundamental principle, that the same data organized differently reveals different patterns, is identical.

The Time Traveler Insight

Merriam-Webster's modern "Time Traveler" feature takes this multi-dimensional thinking even further. Instead of organizing words alphabetically or backwards, it organizes them chronologically. The insight? Words entering the language at the same time reveal semantic relationships.

All train-related vocabulary appeared together (tracks, engines, coal cars). All firearms terminology emerged in the same period. French legal terms entered English as a cluster following the Norman Conquest.

This is exactly what we see in modern embedding spaces, related concepts cluster together, not because we explicitly programmed those relationships, but because the underlying data contains hidden structure that becomes visible with the right organizational scheme.

What We Can Learn from Lexicographers

There's something beautifully systematic about how these lexicographers approached impossibly complex problems. They couldn't rely on computational power, so they had to think harder about the problem itself.

Constraints drive innovation. The physical limitations of card catalogs forced lexicographers to be incredibly thoughtful about information architecture. They had to get it right the first time, there was no easy way to refactor 315,000 index cards.

Systematic thinking scales. Gove's rule-based approach seemed rigid to critics, but it was the only way to maintain consistency across a project spanning decades and dozens of contributors. Modern large-scale ML systems face similar challenges.

Multiple perspectives reveal truth. The backward index wasn't replacing alphabetical organization, it was complementing it. Each view revealed different aspects of the same underlying data structure.

The Continuous Thread

As I write this, I'm using tools that would seem magical to Philip Gove: semantic search across millions of documents, real-time similarity matching, automated pattern recognition in text. But the fundamental challenge he faced, how to organize information so humans can find what they need, remains unchanged.

The backward index represents something profound about human problem-solving. Faced with the constraints of their time, these lexicographers developed solutions that anticipated computational approaches by decades. They were doing feature engineering, multi-dimensional indexing, and semantic clustering with index cards and filing cabinets.

There's a lesson here for those of us building the next generation of AI systems. Sometimes the most elegant solutions come from understanding the problem deeply enough to work within constraints, not around them. Sometimes the best computer science has surprisingly analog ancestors.

The next time you run a semantic similarity search or train an embedding model, remember the lexicographers with their backwards cards. They were solving the same problem you are, they just had to be more creative about it.

The backward index was actively maintained from the 1930s until the 1970s, when computers finally made it obsolete. But its core insight, that different organizational schemes reveal different patterns in the same data, lives on in every vector database and embedding model we build today.

Thanks to Roman Mars and the 99% Invisible team for bringing this story to light, to Peter Sokolowski for discovering and preserving it, and to the countless "harmless drudges" who built the foundations of modern information science, one index card at a time.