What Four Hours of Focused Human-AI Engineering Actually Ships
On May 30th I published v0.1 of a Databricks integration pack for Swamp. By the end of the same evening it was at v0.13. Fifteen models, 100% A on every release, every model end-to-end smoke-tested. What shipped, what I learned, and why it matters if you have Databricks alongside anything else.
Receipts before claims. Fifteen models. Thirteen releases. One evening.
The hook
On May 30th I published v0.1 of a Databricks integration pack for Swamp, the new declarative automation framework from System Initiative. By the end of the same evening it was at v0.13, with fifteen models covering jobs, DLT pipelines, SQL warehouses, the full Unity Catalog tree, workspace secrets, permissions, DBSQL queries, and Git Repos. Every release scored 100/A on the platform's quality rubric. Every model was end-to-end smoke-tested against a real Databricks workspace. One release shipped a real bug; two releases later I shipped the fix. The whole thing took about four hours of focused work, paired with Claude.
This post is about what's in the pack, why it matters if you have Databricks alongside anything else in your stack, and what I learned about engineering velocity along the way.
What I shipped
Fifteen Swamp models, published to swamp.club/extensions/@mfbaig35r/databricks, with source at github.com/mfbaig35r/swamp-databricks. They cover the Databricks data engineering surface:
- Compute and orchestration: jobs (Jobs API 2.2), DLT pipelines, SQL warehouses with statement execution
- Workspace storage: notebooks and workspace files (two distinct Databricks object types, two distinct models, distinct upload semantics)
- Unity Catalog: catalogs, schemas, tables, volumes
- Access control: workspace permissions (jobs, pipelines, warehouses, notebooks, repos, queries, dashboards, alerts...), UC permissions (full grant model for catalogs, schemas, tables, volumes, functions, external locations, storage credentials)
- Workspace secrets: scopes and keys, distinct from Swamp's own vault, with secret values never persisted in Swamp's data layer
- DBSQL: saved queries that pair with the
jobmodel'ssql_task.query.query_idfield - Git Repos: workspace-attached repos with pull, branch switching, sparse checkout, and full lifecycle
Each model exposes the surface a real workspace user would touch. The job model covers Jobs API 2.2 with full task-type schema validation including for_each_task (recursive via Zod z.lazy), dbt_task, and spark_python_task. The uc_permissions model covers all 15 UC securable types with the changes-style add/remove PATCH semantics. The sql_warehouse model includes warehouse lifecycle and the SQL Statement Execution API for running queries from a workflow. Lifecycle-managing models expose idempotent create_or_update semantics with workspace-first reconciliation, which correctly handles delete-then-recreate and out-of-band workspace deletes.
Every release scored 100/A on Swamp's quality rubric. Every model was smoke-tested end-to-end against a real Databricks Free workspace before the pack moved to the next release. "Smoke" means the actual API path: notebook uploaded to the workspace, job created via the API, run triggered, run waited to terminal state, result validated. Not type-check. That discipline caught a genuine semantic bug in v0.8's create_or_update that I shipped a fix for in v0.11. More on that below.
One reference workflow ships alongside the models: a Met Museum API ingestion pipeline at examples/api-ingest/met-museum/, end-to-end validated, runnable on Databricks Free without credentials. It implements a universal pattern for pulling external HTTP APIs into Bronze + Silver Delta tables (rate-limited Spark fan-out via mapInPandas, raw-JSON Bronze for schema-drift immunity, typed Silver via SQL). Forking it for Stripe, GitHub, Salesforce, or any other API is a 45-90 minute exercise: change the config block at the top of the notebook, change the Silver SQL. Everything between stays.
Distribution is the part that isn't done. The pack has zero pulls as of writing. It's day one for distribution; the engineering artifact is finished. More on that at the end.
Why this exists
Databricks already has two ways to manage itself: the Terraform provider for infra lifecycle, and Databricks Asset Bundles (DAB) for in-Databricks pipeline definition. Both are good at what they do. Neither composes with anything outside Databricks, and neither helps you write the Python that actually does the ingest, transform, or training work inside a notebook.
The non-obvious answer the pack is built on: don't ship a generic ingest runtime. Let the agent write the notebook code on demand for the specific case, then orchestrate the agent-written notebook through Swamp's declarative graph.
The Met Museum example in the pack is the proof. I didn't hand-write the 200-line bronze ingest notebook. I described the task to Claude (pull the Met's open catalog into a Delta table, respect their rate limit, capture errors), Claude wrote the notebook, I reviewed it, smoke-tested it on Databricks Free, fixed one bug (the serverless RDD restriction), committed it. The pack's notebook.upload step then uploads that committed notebook to the workspace, and the job.run step executes it. The agent is the IDE; swamp-databricks is the runtime; the committed notebook is the algorithm.
That collapses two boundaries at once.
First, between "build a one-off integration" and "manage a production integration." Both flow through the same tools (Claude Code for authorship, swamp-databricks for orchestration). A new vendor API or a schema change is one agent turn away, not one framework release away.
Second, between Databricks and everything else. Because the workflow is a declarative Swamp graph, the same workflow that runs an agent-written Databricks notebook can also touch S3, Postgres, Cloudflare, GitHub, Slack, or any other Swamp extension in one file, with idempotent reruns, on a schedule. The notebook is the imperative core; the workflow is the declarative outer shell; the agent is what makes the imperative core cheap to write and rewrite. A minimal version of the Met workflow:
steps:
- id: schema
model: "@mfbaig35r/databricks/uc_schema"
method: create_or_update
arguments: { name: met_museum, catalog_name: workspace }
- id: notebook
model: "@mfbaig35r/databricks/notebook"
method: upload
arguments: { path: /Shared/met-museum-bronze, content: |... }
- id: trigger
model: "@mfbaig35r/databricks/job"
method: run
arguments: { job_ref: met-museum-bronze }
- id: wait
model: "@mfbaig35r/databricks/job"
method: wait_run
arguments: { run_id: ${{ steps.trigger.outputs.run_id }} }
- id: grants
model: "@mfbaig35r/databricks/uc_permissions"
method: update
arguments: { changes: [...] }The economic claim is concrete: integration packs that used to take weeks of generic-framework engineering drop to hours of task-specific generation plus review. The Met example took about 30 minutes from "what's a good demo API" to "running on Databricks Free with a Bronze and Silver table populated." A Stripe ingest with cursor pagination and OAuth would be roughly the same: 30-45 minutes from prompt to running pipeline.
This only works under one condition: an engineer who treats smoke tests and code review as load-bearing. Without that, you get production drift driven by prompt variation. With it, you get integration velocity that wasn't available a year ago.
Why I built this on Swamp
Swamp is a new declarative automation framework from System Initiative, the company Adam Jacob is building since Chef. It is small and new. I picked it as the runtime for this pack for three concrete reasons:
Declarative model graph as the unit of composition. Models compose with each other and with models from other domains. Workflows are typed DAGs over models. That structural property is what makes "Databricks + everything else" expressible in one file. It's not a feature you can bolt onto an imperative tool later; it's the shape of the runtime.
First-class agent integration in the CLI. swamp repo init writes a CLAUDE.md with project rules. The extension authoring skills load automatically. The framework was designed with the assumption that AI agents would both author and operate automation, and the tooling reflects that. Working alongside Claude on this pack was friction-free in a way it wouldn't have been on Terraform or Pulumi.
Tight feedback loop in the tooling. swamp extension fmt formats and lints. swamp extension quality scores against a published 14-factor rubric. swamp model method run returns structured output usable by an agent. The registry has yank and unyank, repository-verified state, and CalVer versioning baked in. That toolchain is what made the smoke-test-every-release discipline practical instead of aspirational. The loop is short enough that running it on every change doesn't slow you down.
Honest size-of-bet line: Swamp has a small community today. Whether it finds adoption is unproven. I think the structural choices are right, and I'm willing to spend engineering time on the bet.
What changes when you pair seriously with an AI
This is the section that matters most to me. The thesis: what changes when you pair seriously with an AI is not "the AI writes code." It's that the cost of building integration packs that used to take weeks drops to hours, if you bring engineering judgment to the loop.
Concrete breakdown of who did what across this session:
| Claude did | I did |
|---|---|
| Typed every Zod schema for the Databricks API surface | Decided which API surfaces to ship as v1 (and which to defer) |
| Wrote the first draft of every model file | Caught when first-draft semantics were wrong (the create_or_update tombstone bug) |
| Ran smoke tests against my workspace | Decided what "smoke-tested" actually means (real end-to-end, not type-check) |
| Surfaced honest tradeoff flags at decision points | Made the calls and lived with them |
Recovered from bugs I introduced (workspace URL leak via git add -A) | Authorized the destructive operations (yanks, force-push, delete+recreate repo) |
| Updated documentation, release notes, and memory continuously | Decided when to call a release done and when to keep iterating |
The discipline that makes this not vaporware:
- Every release got smoke-validated end-to-end before the next one started. Not a type-check, not a dry run. Notebook uploaded, job created, run triggered, run waited to terminal, result validated.
- Bugs got caught and shipped fixes for, not papered over. The v0.8 to v0.11 tombstone bug fix is the cleanest example: I noticed the broken semantic mid-session, decided it was real, decided it was worth a dedicated release, shipped the fix, validated the fix, updated the memory file so I won't repeat the mistake.
- The session is preserved in the git history, the release notes, and the registry's version log. Anyone can audit it. The 13 release notes are themselves a session log: what shipped, what was smoke-tested, what bugs were caught, what's still gap.
What I learned about my own work:
- I think faster with someone (something) typing the first draft while I'm still deciding what the right draft is. The bottleneck moves from typing to deciding. That's a real shift in what engineering work feels like.
- The judgment calls are the value. "Which task types to schema-validate first." "Whether the tombstone is a real bug or a tolerable quirk." "Whether to yank v0.2 publicly." "Whether to force-push to scrub a workspace URL leak." None of these are prompts. All of them are calls I made, with informed input from Claude on the tradeoffs.
- This rewards more careful judgment, not less. The temptation to "just ship" gets stronger when the friction drops. The discipline of smoke-testing every release is what kept this from becoming a wall of code that compiles but doesn't work.
This is how I think about engineering work at Canonical Agency. We are not selling prompts. We are selling judgment that compounds with execution velocity, with the discipline of receipts at every step.
Three technical lessons worth pulling out
1. The tombstone bug. Idempotency is harder than it looks.
v0.8 added create_or_update methods to seven models. Reconcile semantics: if a resource with args.name exists in Swamp's data layer, take the update path; otherwise create. Simple, additive, what real automation needs.
Bug: after calling delete, Swamp's readResource still returns the prior data with a tombstone flag, not null. My code treated tombstone-as-truthy, took the PATCH path against a workspace resource that no longer existed, and 404'd.
The fix in v0.11 was to reconcile against the workspace, not against Swamp's local state. New helper existsOnWorkspace(globalArgs, path) does a GET that returns true on 2xx, false on 404 or DOES_NOT_EXIST, throws otherwise. Each create_or_update now does an existence check against the workspace before deciding which path to take.
Side benefit I didn't anticipate: this also handles out-of-band workspace deletes correctly. If someone deletes a job via the Databricks UI and then a workflow calls create_or_update, it now correctly recreates rather than 404'ing on the PATCH.
The lesson: "checked it exists locally" is not the same as "checked it exists at the source of truth." For reconcile semantics, the workspace is the source of truth. Idempotency built on local state alone is a bug waiting to happen.
2. Databricks serverless rejects RDD operations. Smoke discipline matters.
I wrote the Met Museum bronze notebook with rdd.mapPartitions for the rate-limited HTTP fan-out. Spark idiom, well-understood, looked clean. deno check and swamp extension quality both passed.
First job run: [NOT_IMPLEMENTED] Using custom code using PySpark RDDs is not allowed on serverless compute. We suggest using mapInPandas or mapInArrow for the most common use cases.
Serverless Databricks doesn't allow RDD operations. Type-checking and quality scoring can't catch that; only running it can. The fix was straightforward (mapInPandas has identical semantics with a pandas DataFrame interface and works on both serverless and classic clusters), but the only reason I caught it before publishing was the discipline of smoke-testing every model against a real workspace before the next release.
The lesson: type-check is not validation. Quality score is not validation. Validation is the actual API path, run to completion, with results inspected. If you skip that step you ship code that compiles but doesn't work, in a category where that distinction matters.
3. Schema drift at the API boundary. Bronze raw JSON / Silver typed.
Every external API will change its JSON shape on you eventually. Vendors add fields, remove fields, change types, vary shapes by record. If you parse JSON to typed columns at the ingest layer, every schema change breaks your pipeline. This is the most common reason "API to warehouse" pipelines are expensive to maintain.
The pattern that resolves it:
- Bronze stays raw:
object_id BIGINT, http_status INT, raw_json STRING, error_message STRING, ingested_at TIMESTAMP. Five columns. Cannot drift. Captures everything including errors. - Silver is a typed projection built via SQL:
CREATE OR REPLACE TABLE silver.X AS SELECT raw_json:title::string AS title, raw_json:artist::string AS artist, ... FROM bronze.X WHERE http_status = 200. Rebuilt on every refresh.
Drift impact:
- Vendor adds a field: Bronze captures it, Silver ignores until you choose to extract.
- Vendor removes a field: Silver column returns NULL.
- Vendor changes a type: Silver column for that field returns NULL, isolated blast radius.
- Vendor returns malformed JSON for some records: Bronze captures with
http_status, Silver filters them out, you investigate.
The pattern applies to far more than API ingest. Any time you don't control the producer of the data (webhooks, Kafka topics, vendor file drops, CDC streams from operational databases), Bronze as raw archive is the default that buys you immunity to upstream schema changes. The Met implementation in the pack is the cleanest worked example I know how to ship.
Where it goes next
The pack is feature-complete for v1. My benchmark was "enterprise-credible": ten critical gaps closed (permissions, idempotency, UC catalog, the API ingest pattern, others), every model end-to-end smoke-validated, every release at 100/A. That's where we are.
The remaining gaps belong to v2 when real users surface real needs. MLflow integration. Cluster lifecycle for paid workspaces (Free is serverless-only so I couldn't test it). Workspace identity (service principals, users, groups). Azure managed identity auth. Spark task types beyond schema-only validation. These are real but not blocking; they get added when someone hits them on a real engagement.
Distribution is the part that isn't done. Zero pulls today. The artifact exists; the chapter on adoption is the next one. I'd rather publish this with honest numbers and let the work speak than wait for adoption that may or may not come.
If you have Databricks alongside other systems and have been gluing them together with shell scripts, the pack is on the registry: swamp extension pull @mfbaig35r/databricks. The Met Museum example is fork-and-modify ready for any other API.
Closing
I'm not making the strong claim about AI here. I'm making the modest one: paired correctly, with an engineer who treats smoke tests and judgment as load-bearing, the cost of a published, validated integration pack drops by about an order of magnitude.
The pack is at swamp.club/extensions/@mfbaig35r/databricks. The source is at github.com/mfbaig35r/swamp-databricks. The Met Museum example is fork-and-modify ready for any other API.
If you're at a company that has Databricks alongside other systems and wants those systems composed declaratively rather than glued with shell scripts, Canonical Agency is open for engagements that look like this.