Back to Insights
PR Approval Is Not Evidence
AI Governance Audit DevSecOps

PR Approval Is Not Evidence

GitHub shows that someone clicked approve. It doesn't show what they reviewed, what risks were present, or why approval was reasonable. That distinction just became expensive.

AI coding assistants doubled delivery speed.

They also broke a quiet assumption: that a PR approval proves meaningful review.

The Pattern I Keep Encountering

When I audit AI implementations at companies using Copilot or Cursor, I see the same failure mode:

  1. Team adopts AI coding tools
  2. PR throughput triples
  3. Review quality stays flat (or drops)
  4. Auditor asks “how did this payment logic change get approved?”
  5. Engineering points to a green checkmark

That checkmark proves exactly one thing: someone clicked a button.

It doesn’t prove what diff they actually saw. It doesn’t prove the review happened before someone force-pushed. And it definitely doesn’t prove the approver understood the security implications.

Why This Matters Now

I’ve sat in SOC 2 audits where the evidence package was essentially:

  • Screenshot of GitHub approval
  • “Trust us, they reviewed it”

That worked in 2022. Auditors accepted it because the assumption held: humans wrote most code, reviews were slow, diffs were small.

Now? AI assistants generate hundreds of lines in seconds. The assumption broke. Auditors are catching up.

The shift happened faster than most compliance teams expected. In 2023, auditors rarely asked about AI-generated code. By mid-2024, it was a standard line item. The firms I work with are now seeing explicit questions about AI tooling in their audit questionnaires.

The questions I’m hearing from compliance teams:

  • “Can you prove the diff at approval time matches what was merged?”
  • “What was the risk classification process?”
  • “How do you know the reviewer saw the sensitive changes?”

Most teams answer honestly: “We can’t.”

What Happens Without This

The failure mode isn’t dramatic. It’s slow and expensive.

First, audit findings start accumulating. Each one requires remediation documentation. That’s engineering time diverted from shipping.

Then insurance renewals get complicated. Cyber liability underwriters are asking about code review processes. “We use GitHub approvals” is becoming an insufficient answer.

The real cost is invisible: the security incident that traces back to AI-generated code that nobody actually reviewed. The payment processing bug that got auto-approved because it looked like a routine refactor. The auth bypass that sat in a 400-line diff.

By the time you’re explaining to a regulator why your review process failed, the evidence problem has become an existential one.

What Real Evidence Looks Like

After building governance frameworks for two years, I’ve landed on three requirements for defensible review evidence:

1. Immutable Record of What Was Reviewed

Not “the PR was approved” - the actual diff content at the moment of review, hashed and timestamped.

2. Risk Classification

What made this change low-risk vs. high-risk? Was it auto-approved or did a human verify? What policy triggered escalation?

3. Independent Verification

Can someone outside your organization verify the evidence without trusting your systems?

The Infrastructure I Built

The same question kept coming up in my AI Readiness sessions: “How do we prove review quality to auditors?”

I got tired of answering “you can’t with current tools.” So I built the infrastructure.

GuardSpine CodeGuard is a GitHub Action that:

  • Analyzes PR diffs against sensitive patterns (auth, payments, PII, crypto)
  • Assigns risk tiers (L0 trivial through L4 critical)
  • Auto-approves low-risk changes
  • Escalates high-risk changes for human review
  • Generates cryptographically verifiable evidence bundles

The bundle contains hash-chained events, the exact diff at analysis time, and approval records. Anyone can verify it offline.

How It Works in Practice

# .github/workflows/codeguard.yml
- uses: DNYoussef/codeguard-action@v1
  with:
    github_token: ${{ secrets.GITHUB_TOKEN }}
    risk_threshold: L3

When a PR opens:

CodeGuard Diff Analysis - detecting sensitive zones in auth and payment code

  1. Diff gets parsed and checked against sensitive patterns
  2. Risk tier assigned based on what changed
  3. L0-L2 auto-approved, L3-L4 require human review
  4. Evidence bundle generated with hash chain

The PR gets a comment showing risk tier and drivers. The bundle saves as a workflow artifact.

The Verification Story

GuardSpine Verifier - cryptographic bundle verification with evidence summary

guardspine-verify evidence-bundle.json

# [OK] Hash chain verified (3 events)
# [OK] Final hash matches
# [OK] Bundle integrity confirmed

No trust required. Your auditor can verify independently.

The spec and verifier are MIT licensed. You can audit the verification logic yourself. This matters when “trust us” isn’t an acceptable answer.

The Bigger Picture

This connects to what I wrote about in Why Enterprise AI Will Fail - the constraint isn’t model quality, it’s infrastructure and governance.

GuardSpine is one piece of that infrastructure. It solves the specific problem of PR review evidence. Combined with the governance frameworks I help clients build, it creates a defensible posture that scales with AI adoption.

The teams that win will be the ones who built the guardrails before they hit the gas.


Resources:


Ready to build defensible AI governance for your team?

I run AI Readiness Sessions where we map your current review processes against what auditors are actually asking for. No slides, no theory - just a working roadmap for evidence infrastructure.

Book a free 30-minute consultation