Human in the Loop: When AI Review Needs a Human Override
When to escalate from AI review to human review. GuardSpine's escalation triggers, L3-L4 risk routing, and the anti-hollowing case for keeping humans sharp.
Every AI governance vendor wants to tell you their system is fully autonomous. I want to tell you the opposite: some decisions should never be automated, and the hard part is knowing which ones.
GuardSpine’s escalation system exists because I do not trust AI review for everything. I built it, I run it on my own repositories, and I still require human sign-off on high-risk changes. This is not a limitation of the technology. It is a design choice about where human judgment is irreplaceable.
The Escalation Boundary
GuardSpine routes reviews based on the L0 through L4 risk tier classification:
- L0-L1 (formatting, docs, trivial changes): AI council reviews and auto-approves. No human needed.
- L2 (moderate logic changes, new features): AI council reviews and posts comments. Human can merge without additional approval, but the review is visible.
- L3 (authentication, authorization, security-sensitive changes): AI council reviews, but merge is blocked until a designated human approves.
- L4 (payment processing, cryptographic changes, infrastructure access control): Same as L3, plus the human approver must hold a specific role (security lead, compliance officer, etc.).
The boundary between L2 and L3 is the escalation boundary. Below it, AI handles the review autonomously. Above it, humans are in the loop and their sign-off is mandatory.
This boundary is configurable. Some teams set it lower — they want human review on all non-trivial changes. Some set it higher — they trust AI review for most security changes and only require humans for payment and crypto. The default puts the boundary where I think the risk profile shifts: when a change can affect who has access to what.
Why Not Automate Everything
The argument for full automation is efficiency. If the AI council can review a PR in 90 seconds, why wait hours for a human? The models are getting better. False positive rates are dropping. Why not let the machine handle it all?
Three reasons.
Model blind spots are correlated. When I run Claude, GPT, and Gemini as a council, their reviews are genuinely diverse for most code. But they share training data, they share architectural patterns, and they share failure modes. A subtle authorization bypass that none of the three models flag is not unlikely — it is the expected outcome for certain bug classes. Models trained on internet-scale code have seen many examples of correct auth patterns and few examples of the specific way your auth system can be exploited.
Human reviewers who know your system’s architecture, your threat model, and your business logic catch things that no general-purpose model catches. Not because they are smarter than the model, but because they have context the model does not.
Regulatory requirements demand it. HIPAA, SOC 2, PCI DSS, and the EU AI Act all have provisions that require human oversight for certain categories of decisions. An automated approval on a change to PHI access patterns does not satisfy HIPAA’s requirement for administrative safeguards. A human must review and approve. The evidence bundle must show that a human — identified by name and role — made the decision.
You can argue that the regulation is outdated. You might be right. But your auditor does not care about your argument. They care about your evidence.
The anti-hollowing problem. If humans never review security-critical code because the AI handles it, humans lose the ability to review security-critical code. Skills atrophy. The team’s institutional knowledge degrades. When the AI gets it wrong — and it will — nobody on the team can catch the error because nobody has been practicing.
I wrote about this in more detail in the anti-hollowing post. The short version: keeping humans in the loop on high-risk changes is not just about catching AI mistakes today. It is about maintaining the human capability to catch AI mistakes tomorrow.
Configuring Escalation Triggers
GuardSpine’s escalation is configured in the policy file. Beyond the default path-based triggers, you can define custom escalation conditions:
escalation:
triggers:
- name: council-disagreement
condition: council_agreement_score < 0.6
action: escalate_to_human
description: Models disagree on review outcome
- name: high-entropy-change
condition: pii_shield_findings > 0 AND risk_tier >= L2
action: escalate_to_human
role: security_lead
description: PII detected in non-trivial change
- name: new-dependency
condition: dependency_added AND dependency_downloads < 1000
action: escalate_to_human
description: Low-popularity dependency added
- name: config-change
condition: file_path matches "*.env*" OR file_path matches "*config*"
action: escalate_to_human
description: Configuration file modified
The council_agreement_score trigger is one I use on every project. When the AI models disagree significantly about whether a change is safe, that disagreement itself is a signal. It means the change is ambiguous enough that a human should look at it. A 0.6 threshold means at least 40% of the council had a different assessment than the majority. That is not a rounding error — that is genuine uncertainty.
The dependency trigger catches supply chain risks. A new dependency with fewer than 1000 weekly downloads is unusual enough to warrant human review. It might be fine. It might be a typosquatting package. A human can spend 60 seconds checking the package’s repository and author.
What the Human Reviewer Sees
When a change escalates, the human reviewer does not start from scratch. They see everything the AI council already produced:
- The diff, highlighted with risk annotations
- Each council member’s review, with specific line references
- The risk tier classification and the rules that triggered it
- The escalation reason (which trigger fired)
- PII-Shield findings, if any
- A recommended action from the council majority
The human’s job is not to re-review the entire change. It is to apply judgment to the specific concern that triggered escalation. If the escalation happened because the council disagreed, the human reads the dissenting review and decides who is right. If it happened because PII-Shield flagged something, the human verifies whether the flagged string is actually sensitive.
This is focused review, not comprehensive review. The AI did the comprehensive review. The human handles the judgment call.
The Approval Evidence
When the human approves, the evidence bundle records:
- Who approved (identity, not just a username — role and organization)
- When they approved (timestamp, UTC)
- The escalation context they were shown
- Whether they added any comments or conditions
- The resulting risk tier (which may be adjusted by the human)
This evidence is part of the hash chain. It cannot be modified after the fact. The auditor can verify that a specific human, holding a specific role, reviewed a specific escalation, at a specific time.
Compare this to a GitHub approval review. GitHub records that someone clicked “approve.” It does not record what they were shown, why they approved, or whether they held the authority to approve changes of that risk level. A GitHub approval is a checkmark. An evidence bundle approval is a signed, contextualized decision record.
Anti-Hollowing in Practice
I run a rotation on my own projects. Even when I could review every L3+ change myself, I make sure other team members take turns. The goal is not efficiency — it is maintaining capability.
Each person who reviews an L3+ change is exercising their security review skills. They are reading the AI council’s analysis, evaluating it, and making an independent judgment. Over time, they develop intuition for what the AI gets right and where it struggles. They learn the system’s architecture through the lens of its risk surface.
If I automated those reviews away, I would save 20 minutes per day and lose something I cannot easily rebuild: a team that knows how to evaluate security-critical code changes.
The temptation to automate everything is strong. Resist it where the cost of failure is high and the feedback loop is long. Authentication bypasses do not announce themselves on the day they ship. They announce themselves on the day they are exploited.
Finding Your Escalation Boundary
Start strict and loosen over time. Begin with L2+ escalating to humans. After a month, review the escalation log. How many L2 escalations did humans approve without modification? If the answer is “almost all of them,” consider moving the boundary to L3+.
The evidence bundles make this analysis possible. You have a record of every escalation, every human decision, and every case where the human overrode the AI’s recommendation. Data-driven adjustment, not guessing.
The goal is not zero human reviews. The goal is human reviews where humans add the most value — the changes that are ambiguous, high-risk, or novel enough that pattern-matched AI review is insufficient.
Want to configure escalation thresholds for your team’s risk profile? Book a call and I will walk through the policy file with you.