Ralph Wiggum Loops: Persistence Until Quality

One-shot AI generation is like first-draft writing. Sometimes brilliant. Usually not. The fix isn’t hoping for a better first try — it’s building a loop that won’t stop until quality passes.

I named this pattern after Ralph Wiggum from The Simpsons. Not because the output is dumb — because the persistence is relentless. Ralph doesn’t stop. He doesn’t get discouraged. He keeps going with absolute commitment regardless of how many times he fails. That’s exactly the property you want in a refinement loop.

The One-Shot Problem

Most people use AI in one-shot mode. They send a prompt, get a response, and either accept it or start over from scratch. This is like writing a first draft and publishing it without editing.

Sometimes the first draft is great. Most of the time, it’s 70% of what you need with subtle problems that only show up when you look closely. A function that handles the happy path but breaks on edge cases. A refactor that’s clean but introduces a coupling violation. Code that passes tests but has a structural smell that’ll cost you six months from now.

The one-shot approach puts all the quality burden on the human reviewer. You have to catch everything the AI missed. That works when you’re reviewing one output. It collapses when you’re reviewing fifty.

The Loop Structure

A Ralph Wiggum loop has four components: an executor, a quality gate, a feedback channel, and a safety limit.

The executor produces output. First iteration, it works from the original task. Every subsequent iteration, it works from the original task plus the gate’s feedback about the previous attempt.

The quality gate evaluates the output against concrete criteria. Not “is this good?” but “does this pass these specific checks?” The gate returns one of three results: pass, fail with feedback, or fail with blocking error. Pass means done. Fail with feedback means try again. Blocking error means stop — something is fundamentally wrong and iteration won’t fix it.

The feedback channel connects the gate’s output back to the executor’s input. This is where most people’s retry logic falls apart. They retry the same prompt and hope for a different result. A Ralph loop feeds the specific failure reason back into the next attempt. “Function X throws on null input” is actionable feedback. “Try again” is not.

The safety limit caps iterations at 50. If the system hasn’t converged after 50 attempts, it’s not going to. The task gets escalated to a human with the full refinement chain attached — every attempt, every gate result, every piece of feedback. The human doesn’t start from zero. They start from 50 attempts of diagnostic information.

The Exit Code 2 Pattern

The quality gate communicates through exit codes. Exit code 0 means pass. Exit code 1 means fail, try again. Exit code 2 means blocking failure — stop iterating.

Exit code 2 is critical. Without it, the loop will keep hammering on a problem that can’t be solved by iteration. If the task requires an API that doesn’t exist, no amount of retrying will make it appear. If the test infrastructure is broken, the code isn’t the problem. Exit code 2 catches structural impossibilities and prevents wasted compute.

The distinction between “fail, try again” and “fail, stop” is what separates a smart loop from a dumb retry. Every retry loop needs a circuit breaker.

Connascence Analysis as a Quality Gate

Here’s where it gets concrete. One of my quality gates runs Connascence analysis on the output.

Connascence is a measure of coupling between software components. There are nine types, ranging from benign (connascence of name — two things share an identifier) to toxic (connascence of identity — two things must reference the same object instance). The analyzer scores the output and flags violations above a configurable threshold.

When the Ralph loop produces a code refactor, the Connascence gate checks whether the refactor introduced new coupling. Did the change create connascence of algorithm (two components that must use the same algorithm)? Did it introduce connascence of timing (two components that must execute in a specific order)? These are the kinds of structural problems that pass unit tests but create maintenance nightmares.

If the Connascence score exceeds the threshold, the gate returns exit code 1 with specific feedback: “Function A and Function B now share connascence of meaning — they both interpret the value 3 as ‘admin role.’ Extract this to a named constant.” The executor gets that feedback, fixes the specific issue, and resubmits.

The loop converges because each iteration has specific, actionable feedback. It’s not guessing. It’s debugging.

Council Voting Inside a Ralph Loop

For tasks that need judgment rather than measurement, I use an LLM council as the quality gate.

Three models — typically Claude, Gemini, and a local model via Ollama — each independently evaluate the output. They vote pass or fail with written reasoning. The quality gate requires consensus: if two out of three vote fail, the output fails. The dissenting opinions become the feedback for the next iteration.

This catches a class of problems that automated checks miss. Style issues. Architectural decisions that are technically correct but strategically wrong. Documentation that’s accurate but confusing. The council brings multiple perspectives, and disagreement between models is itself a signal that the output needs work.

The Ralph loop doesn’t care whether the quality gate is a linter, a test suite, a Connascence analyzer, or a panel of AI models. It just needs a gate that returns pass, fail-with-feedback, or blocking-fail. The gate is a plug-in. The loop is the structure.

The Evidence Chain

Every iteration of a Ralph loop produces an evidence record: the input, the output, the gate result, and the feedback. When the loop completes — whether by passing the gate or hitting the safety limit — you have a complete refinement chain.

This chain is valuable even when the loop succeeds on iteration one. It proves that the output was evaluated. It records what criteria were checked. It provides a baseline for future tasks of the same type.

When the loop takes five iterations, the chain is diagnostic gold. You can see exactly where the AI struggled, what kinds of feedback it needed, and how it responded to correction. This data feeds back into prompt optimization — if the same type of feedback shows up repeatedly, the initial prompt needs improvement.

The chain also provides accountability. When someone asks “how do we know this code is good?”, the answer isn’t “Claude said so.” The answer is “it passed Connascence analysis, three-model council review, and the full test suite across four iterations of refinement. Here’s the evidence.”

When to Use a Ralph Loop

Not everything needs iteration. Simple lookups, straightforward CRUD operations, well-defined transformations with clear test suites — these usually pass on the first try. Running them through five iterations of council review is waste.

Ralph loops earn their keep on tasks with subjective quality criteria, complex structural requirements, or high blast radius. Refactoring a core module. Writing a security policy. Generating a migration plan for a production database. These are tasks where “close enough” isn’t good enough, and where the cost of getting it wrong far exceeds the cost of a few extra iterations.

The safety limit of 50 iterations sounds high, but in practice most tasks converge in 3-7 iterations. The limit exists for the edge cases — the tasks that expose a fundamental misunderstanding that no amount of refinement will fix. When you hit 50, that’s not a failure of the loop. That’s the loop telling you the task needs to be reframed.

Ralph Wiggum doesn’t give up. But he does know when to ask for help.

I build AI quality systems that iterate until the evidence says done, not until the AI says done. If you want to see what structured refinement looks like in a production environment, let’s talk.

Book a call: https://cal.com/davidyoussef