Named Modes: audit/speed/research/robust/balanced

What if your AI could automatically switch between “move fast” and “be careful” based on what it’s doing? Not a vague instruction — a mathematically optimal configuration discovered through multi-objective optimization.

That’s not a hypothetical. I built it.

The Problem With One-Size-Fits-All

Most AI systems ship with a single configuration. One temperature setting. One set of validation rules. One retry policy. You get the same behavior whether you’re reformatting a README or reviewing a security-critical pull request.

This is obviously wrong. You don’t drive 25 mph on the highway. You don’t drive 80 mph in a school zone.

But the fix isn’t “just add some if-statements.” The fix is discovering which configurations are actually optimal — and proving it with math, not intuition.

What Makes a Mode “Named”

A named mode is a specific configuration of every tunable parameter in the system — context depth, validation strictness, retry budgets, tool selection weights, confidence thresholds — that sits on the Pareto frontier of accuracy versus efficiency.

Pareto frontier means: you cannot improve one objective without hurting the other. Every named mode represents a genuine trade-off, not a suboptimal point that could be improved in both dimensions simultaneously.

These aren’t hand-tuned presets. They’re discovered through the two-stage multi-objective optimization I described in the previous post. Stage one explores the parameter space with evolutionary search. Stage two refines the Pareto-optimal candidates with local gradient methods. What survives is a set of configurations where every point earns its place.

The Five Modes

Here are the five modes currently in production, with their measured accuracy and efficiency scores.

Audit (0.960 accuracy, 0.763 efficiency). Maximum scrutiny. Every claim gets verified. Every code path gets traced. Token cost is high and latency is noticeable. This is the mode that catches the bug hiding behind three layers of indirection. You pay for thoroughness in wall-clock time.

Speed (0.734 accuracy, 0.950 efficiency). Minimum viable validation. Fast responses, shallow checks, aggressive caching. Good enough for formatting fixes, dependency bumps, and boilerplate generation. The mode that doesn’t waste twenty seconds analyzing a one-line typo fix.

Research (0.980 accuracy, 0.824 efficiency). The highest accuracy mode in the set. Broader context loading, more aggressive cross-referencing, deeper exploration of alternative interpretations. Designed for situations where being wrong is expensive and being thorough is the whole point. Literature reviews. Architecture decisions. Threat modeling.

Robust (0.960 accuracy, 0.769 efficiency). Same accuracy ceiling as audit, but with different internals. Robust mode prioritizes consistency under adversarial or noisy inputs. More retries, stricter input validation, heavier fallback chains. Where audit catches subtle bugs, robust survives messy environments.

Balanced (0.882 accuracy, 0.928 efficiency). The workhorse. High enough accuracy for most production tasks, efficient enough to not burn your budget. This is the default mode — the one that handles 70% of real-world work without switching.

Every Mode Sacrifices Something

This is the part people skip when they talk about optimization. There is no free lunch. Look at the numbers again.

Speed mode drops to 0.734 accuracy. That means roughly one in four complex decisions will be suboptimal. That’s fine for renaming a variable. It’s catastrophic for a compliance review.

Research mode burns 17.6% more resources than balanced. Run it on every task and your costs balloon for no reason. Run it only on architecture decisions and you get the highest accuracy in the system exactly where it matters.

Audit and robust look similar on paper — 0.960 accuracy each. But they solve different problems. Audit finds what’s wrong. Robust survives what’s hostile. The distinction matters when you’re choosing which mode to assign to a task.

The Pareto frontier is a curve, not a point. You must choose where to sit on it.

Runtime Mode Selection

Knowing the modes exist is step one. Automatically selecting the right mode for each task is where the real value lives.

Mode selection maps directly to risk tiers. In the GuardSpine framework, every code change gets classified into risk levels L0 through L4. The mapping is straightforward.

L0 (cosmetic changes, docs, formatting): speed mode. No reason to burn tokens on a whitespace fix.

L1 (low-risk logic changes, test updates): balanced mode. Enough validation to catch obvious mistakes, fast enough to stay out of the developer’s way.

L2 (moderate changes, new features, API modifications): balanced or robust, depending on input quality. Clean PR from a senior engineer? Balanced. Sprawling diff from an AI agent? Robust.

L3 (high-risk changes, auth flows, data handling): audit mode. Full trace, full verification, full evidence bundle.

L4 (critical infrastructure, crypto, access control): research mode. Maximum accuracy. Cross-reference everything. Miss nothing.

This mapping isn’t hardcoded. It’s a policy layer that organizations can customize. But the defaults are battle-tested.

Custom Mode Creation

Five modes cover most scenarios. But every domain has its own accuracy-efficiency curve.

A biotech team reviewing gene therapy protocols needs a mode that’s even more conservative than research — something closer to 0.995 accuracy with efficiency as an afterthought. A game studio pushing cosmetic asset updates needs something even faster than speed mode, where 0.650 accuracy is perfectly acceptable because the worst case is a misaligned texture that gets caught in playtesting.

Custom modes work the same way as the built-in five. Define your objective weights. Run the optimization. Extract the Pareto-optimal configurations. Name them something your team understands.

I’ve seen teams create modes called “ship-it” (aggressive speed, minimal checks), “fda-ready” (maximum accuracy with audit trails), and “friday-deploy” (robust mode with extra rollback safeguards). The naming doesn’t matter. The math does.

The Trade-Off Visualization

Plot accuracy on the Y-axis and efficiency on the X-axis. The Pareto frontier forms a curve from the upper-left (research: high accuracy, moderate efficiency) to the lower-right (speed: moderate accuracy, high efficiency).

Every point below the curve is dominated — a configuration that’s worse in both dimensions than something on the frontier. Delete it. Every point on the curve is non-dominated — genuinely optimal given its trade-off preference.

The five named modes are anchor points on this curve. They give teams a shared vocabulary for discussing trade-offs. “Run this in audit mode” is clearer than “increase the validation strictness parameter to 0.87 and set the retry budget to 4.”

Shared vocabulary reduces coordination cost. Reduced coordination cost is how you scale.

What This Connects To

Named modes are the runtime expression of everything in this series so far. The cognitive architecture provides the structure. Memory systems provide continuity. Multi-objective optimization discovers the configurations. Named modes make those configurations usable by humans who don’t want to think about Pareto frontiers every time they push code.

The next post in this series covers how these modes compose — what happens when a single task requires audit-level accuracy on the security components but speed-mode efficiency on the boilerplate. Mode composition is where the architecture stops being a configuration system and starts being a reasoning framework.

But that’s next time.

Start Here

If you’re running AI systems with a single configuration, you’re either over-spending on easy tasks or under-validating hard ones. Probably both.

Named modes fix this. Not with heuristics. With optimization.

Need custom optimization modes for your domain? Let’s design them together: https://cal.com/davidyoussef