Everything Is a Diff: Why the Code Review Crisis Is Just the Beginning

After I realized code review was broken, I made the mistake of thinking the problem was about code. It’s not. AI is rewriting regulatory filings, financial models, contract clauses, and slide decks. Nobody’s diffing any of it. The code review crisis is just the canary.

The Artifact Explosion

Here’s what’s happening right now, at scale.

1.4 billion Google Docs files were edited with Gemini-assisted rewriting in the first half of 2025 alone. Microsoft 365 Copilot has 33 million active users. 52% of enterprise employees now use AI to edit written content. 47% use it to draft new materials from scratch.

That’s not future-state. That’s the present tense.

GitHub solved version control for code decades ago. Git diff, pull requests, CI/CD pipelines, code signing — we have a mature toolchain for governing what changes in source code and who approved it. AI is straining that toolchain (I wrote about this in “Vibe Coding Broke Code Review”), but at least the toolchain exists.

For documents, spreadsheets, and contracts? We have Track Changes and “Version 7 FINAL FINAL (2).docx.”

The Same Problem, Worse Tools

A regulatory filing gets 40% rewritten by AI. Who reviewed the changes? A financial model gets formulas replaced. Who checked the logic? A contract clause gets “improved” by Copilot. Who approved the new language?

The answer, in almost every organization, is: nobody, specifically.

The numbers back this up. McKinsey’s 2025 State of AI report found that 88% of organizations use AI in at least one business function, but only 9% have mature AI governance. 33% lack evidence-quality audit trails entirely. 61% operate with fragmented logs across disconnected systems.

That 88% vs 9% gap is the governance deficit. And it’s widest in the document layer, because at least code has Git.

What AI Gets Wrong (and Nobody Catches)

The legal profession is learning this lesson expensively. The Damien Charlotin database now tracks 486 documented cases of AI-hallucinated citations in court filings worldwide — 324 in US courts alone, arriving at four or five new cases per day.

In February 2026, a Kansas judge fined five attorneys up to $5,000 each for an AI-generated brief containing fabricated cases. In the Noland v. Land of the Free case, 21 of 23 citations in a plaintiff’s brief were AI-generated fictions. $10,000 sanction, referral to the State Bar.

A Stanford study found that even purpose-built legal research tools from LexisNexis and Thomson Reuters hallucinate more than 17% of the time. General-purpose LLMs on legal queries? 58% to 82%.

And that’s just legal filings — the most scrutinized document category in existence. Spreadsheets are worse. 88% of Excel-based finance models already contain errors before AI touches them. JPMorgan’s London Whale in 2012 — a single Excel formula dividing by sum instead of average — muted volatility by a factor of two and contributed to a $6.2 billion trading loss.

Now add AI to that stack. When processing large CSV files, models “forget” column headers by Row 50 and mistake a Debt figure for Revenue. AI cannot safely handle deterministic logic encoded in formulas. But it will happily rewrite them if you ask.

The Pattern: If It Can Be Diffed, It Can Be Governed

Here’s the realization that changed my direction.

GitHub showed that code governance works when you can see exactly what changed (git diff), who approved it (PR review), and prove the chain of custody (code signing, SLSA provenance). The diff is the atomic unit of governance.

That same pattern applies to everything:

PDFs have pages. Changes can be diffed at the page level.
Spreadsheets have cells. Changes can be diffed at the cell and formula level.
Contracts have clauses. Changes can be diffed semantically.
Images have pixels. Changes can be compared at the region level.

If it can be diffed, it can be reviewed. If it can be reviewed, it can produce evidence. If it produces evidence, it can be audited.

This is when I realized the product couldn’t be “a code review tool.” It had to be an artifact governance layer.

What Existing Tools Miss

The obvious objection: don’t we already have this? SharePoint has version history. Microsoft Purview tracks AI interactions. Google Workspace has suggestion mode.

These tools track that a change was made and which account made it. They cannot tell you:

Whether AI generated the change versus a human typing it.
Whether the AI hallucinated content.
Whether a qualified human reviewed the AI output before accepting it.
A cryptographic proof of provenance binding the artifact to its review chain.

Track Changes shows diffs. It doesn’t prove governance happened.

FINRA’s 2026 Annual Regulatory Oversight Report explicitly highlights generative AI as an emerging compliance risk, warning that “once an AI system can take action, rather than merely generate content, the firm’s supervisory, books-and-records, and governance obligations shift materially.” The EU AI Act takes full enforcement in 2026 with fines up to 35 million euros or 7% of global revenue.

The compliance hammer is coming. “We have version history” won’t be a sufficient answer.

Even the emerging AI Bill of Materials (AI BOM) standards — CycloneDX 1.6, SPDX 3.0 — focus on ML pipeline artifacts: model weights, training data, evaluation sets. They govern what goes into the AI. Nobody governs what comes out.

The output artifacts — the rewritten filing, the modified spreadsheet, the altered contract — remain ungoverned. As Corporate Compliance Insights put it: “When AI can initiate actions, governance becomes a control loop, not a document.”

That’s the thesis in one sentence. Governance must operate on the artifacts themselves — diffs, provenance, review chains — not on policy documents about artifacts.

What I Built Instead

GuardSpine uses the same evidence model across every artifact type. A code change, a PDF edit, a spreadsheet modification, and an image alteration all produce the same output: a cryptographically sealed evidence bundle.

The bundle contains the diff (what changed), the provenance (who or what made the change), the review chain (who approved it and at what risk tier), and a SHA-256 hash chain that anyone can verify offline without trusting our system.

The spec is open (Apache 2.0). Auditors run guardspine-verify bundle.zip and get integrity validation without network access or vendor dependency.

Four guard lanes are live today: CodeGuard for pull requests, PDFGuard for documents, SheetGuard for spreadsheets, ImageGuard for visual assets. Seven more are in development.

The Uncomfortable Math

Code has mature governance tooling and AI is still breaking it — AI-generated code creates 1.7x more issues, 45% contains security flaws, and change failure rates are up 30%.

Documents, spreadsheets, and contracts have weaker review tooling than code. And the same AI models are editing them at the same or greater scale.

If the code review toolchain is struggling, the document review “toolchain” is already failing silently. The incident that forces the market to care — the London Whale moment for AI-modified documents — hasn’t happened publicly yet. But 1.4 billion AI-edited Docs in six months says it’s a matter of when, not if.

If your organization uses AI to edit documents, spreadsheets, or contracts and you don’t have artifact-level governance, the gap is growing every day.

I run AI Readiness Sessions where we map your current governance capabilities against what’s actually needed. No slides, no theory — just a working roadmap.

Book a free 30-minute consultation