Three Loops: Planning, Development, Quality

Most teams have a development loop. Fewer have a planning loop. Almost none close the quality loop back into planning. That’s why the same bugs keep showing up.

I’ve watched teams ship the same category of bug three sprints in a row. Authentication edge cases. Race conditions in concurrent operations. Data validation at system boundaries. Each time, the fix is different. Each time, the root cause is the same: nobody fed the failure pattern back into the planning process.

The fix isn’t better developers. It’s closing the loop.

The Three-Loop Architecture

Loop 1 is Discovery and Planning. Loop 2 is Development and Implementation. Loop 3 is CI/CD Quality and Recovery. Between Loops 1 and 2, there’s a half-loop — Loop 1.5 — that most teams skip entirely.

Each loop has a different cadence, different inputs, and different outputs. The power comes from the connections between them.

Loop 1: Discovery and Planning (6-11 Hours)

Loop 1 takes the longest because it’s the only loop where you can prevent problems instead of fixing them.

The input is a combination of: what the user/stakeholder wants built, what the codebase currently looks like, what broke in the last cycle, and what the team has learned about the problem domain since the last planning session.

That last input is the one most teams skip. They plan based on requirements and current state. They don’t plan based on accumulated failure knowledge. If the last three sprints all had authentication bugs, Loop 1 should include an explicit authentication review step in every new feature plan.

The 6-11 hour range is deliberate. Under 6 hours means you’re not investigating deeply enough. Over 11 hours means you’re overthinking and should start building to learn more. The exact duration depends on the feature’s novelty — a variation on existing functionality needs 6 hours. A new integration with an unfamiliar system needs 11.

Loop 1 outputs three artifacts:

A scope document that defines one unit of delivery. Not “improve the auth system.” Instead: “add rate limiting to the login endpoint, threshold 5 attempts per minute per IP, response is HTTP 429 with Retry-After header.” Concrete. Testable. Shippable independently.
A risk register that lists what could go wrong. Not generic risks — specific risks based on this team’s history. “Our rate limiter tests have missed the Redis connection failure path in 2 of the last 3 features that used Redis.”
Success criteria with numbers. Not “it should be fast.” Instead: “p95 latency under 50ms at 1000 requests per second.”

Loop 1.5: Session Reflection (5-15 Minutes)

This is the half-loop that changes everything. It runs at the boundary between planning and development — after a planning session ends and before coding begins.

Loop 1.5 captures two things:

Corrections: Points where the planning session changed direction. “We initially planned to use Redis for rate limiting, then realized our deployment doesn’t have Redis, and switched to an in-memory sliding window.” The correction itself is valuable knowledge. It means the team’s default assumption (use Redis) is miscalibrated for this environment.

Approvals: Decisions that were explicitly confirmed. “We confirmed with ops that the deployment supports up to 2GB memory per container, so the in-memory approach is viable.” Approvals are important because they document the assumptions that the implementation relies on. If a future deployment changes the memory limit, the team knows which features are affected.

Loop 1.5 takes 5-15 minutes. It’s a structured debrief, not a retrospective. The output is a short document — typically 10-20 bullet points — that gets attached to the scope document and travels with it through Loop 2.

Most teams don’t do this because it feels redundant. “We just discussed all of this. Why write it down?” Because three weeks from now, when the feature breaks in production and nobody remembers why you chose the in-memory approach over Redis, that document is the difference between a 20-minute fix and a 4-hour investigation.

Loop 2: Development and Implementation (4-6 Hours)

Loop 2 is where code gets written. Its input is the scope document, risk register, and Loop 1.5 reflection from Loop 1.

The 4-6 hour timeframe is a constraint, not an estimate. If a unit of delivery takes more than 6 hours to implement, the scope document was too big. Break it into smaller units. This constraint forces granularity in planning, which produces smaller diffs, faster reviews, and easier rollbacks.

During Loop 2, the risk register is a checklist. Each identified risk has an explicit mitigation in the code. “Redis connection failure” means there’s a test that simulates Redis being down and verifies the system degrades correctly. If the risk was identified in Loop 1 and there’s no corresponding test or mitigation in Loop 2, something went wrong.

Loop 2 outputs:

Working code with tests. Not “code that passes on my machine.” Code that passes in CI with the same configuration as production.
A diff narrative — a short explanation of why the changes look the way they do. Not what changed (the diff shows that) but why. “Used in-memory sliding window instead of Redis because deployment memory budget supports it and it eliminates the Redis dependency for this feature.”
Updated risk register marking which risks were mitigated and how, plus any new risks discovered during implementation. “Discovered that the in-memory approach loses rate limit state on container restart. Mitigated by accepting a brief window of unlimited requests after restart, documented as known limitation.”

Loop 3: CI/CD Quality and Recovery (1.5-2 Hours)

Loop 3 is automated but not unsupervised. The CI pipeline runs tests, linters, security scans, and deployment checks. Recovery procedures handle failures.

The 1.5-2 hour timeframe covers the full pipeline: build, test, scan, deploy to staging, smoke test, deploy to production, and post-deployment verification.

But here’s what makes Loop 3 different from a standard CI/CD pipeline: Loop 3 failures feed back to Loop 1.

When a test fails in CI, the failure is classified:

Type A: Code bug. Fix it in Loop 2, re-run Loop 3. This is the normal flow.
Type B: Test bug. The test is wrong, not the code. Fix the test, but also log why the test was wrong — usually because the scope document was ambiguous.
Type C: Infrastructure bug. The CI environment differs from the development environment in a way that causes failures. Fix the infrastructure, but log the discrepancy for Loop 1 — future planning needs to account for this environment difference.
Type D: Design bug. The code works as implemented but the implementation is wrong because the plan was wrong. This is the critical category. Type D failures don’t get fixed in Loop 2. They get sent back to Loop 1 for replanning.

Most teams treat all CI failures as Type A. They fix the code and move on. Type D failures disguised as Type A failures are how teams ship features that work but solve the wrong problem.

Inter-Loop Integration

The loops aren’t sequential. They run concurrently on different units of work. While Loop 2 implements Feature X, Loop 1 plans Feature Y. While Loop 3 validates Feature X, Loop 2 starts Feature Y.

The integration artifacts are what keep this from becoming chaos:

Loop 1 -> Loop 2: Scope document + risk register + Loop 1.5 reflection Loop 2 -> Loop 3: Code + tests + diff narrative + updated risk register Loop 3 -> Loop 1: Failure classifications + environment discrepancies + new risk patterns

That last connection — Loop 3 feeding back to Loop 1 — is the one that closes the system. Without it, you have two independent loops (planning and development) plus an automated gate (CI/CD). With it, you have a learning system that gets better at planning because it remembers what went wrong in production.

What This Looks Like in Practice

Monday morning. Loop 1 for the week’s first feature. I pull up the risk register from the previous two weeks and look for patterns. Three of the last five features had deployment failures because the staging environment had a different Node.js version than production. That’s a Type C failure pattern.

Before I plan anything else, I add a task: pin the Node.js version in the deployment manifest and add a CI check that verifies version parity between staging and production. That takes 20 minutes. It prevents a class of failure that has cost 6+ hours over the last two weeks.

Then I plan the actual feature. The scope document is specific. The risk register includes “watch for environment-specific behavior” because that’s the current hot pattern. Loop 1 takes 7 hours.

Loop 1.5: 10 minutes. I note that we changed the database migration strategy mid-planning because the original approach would have required downtime. That correction gets logged.

Loop 2: 5 hours. Code, tests, and a diff narrative that explains the migration strategy choice. Risk register updated with a new finding — the migration creates a 30-second window where old and new schemas coexist.

Loop 3: 1.5 hours. Pipeline passes. Post-deployment smoke test reveals the 30-second schema coexistence window causes a spike in 500 errors from a service that queries the affected table. That’s a Type D failure — the design assumed services could handle schema coexistence, but they can’t.

The failure goes back to Loop 1. Next time we plan a migration, the risk register includes “verify all consuming services handle schema transitions gracefully.” The system learned.

Why Three Loops, Not Two or Four

Two loops (plan, build) miss the feedback from production failures back to planning. You’re flying open-loop on quality.

Four loops add complexity without adding information. I’ve seen teams try to separate “quality” from “recovery” into distinct loops. The overhead of managing four loop boundaries exceeds the value of the additional granularity.

Three loops hit the sweet spot: enough structure to close the feedback cycle, little enough overhead to actually run on a weekly cadence.

The half-loop (1.5) exists because the planning-to-development boundary is where the most knowledge gets lost. People finish a planning meeting, start coding, and immediately forget why they made certain decisions. Five minutes of structured reflection prevents hours of archaeological investigation later.

Designing development processes for AI-assisted teams? I help organizations build feedback loops that learn from production failures. Book a call to discuss your development workflow.