Back to Insights
AI Memory Context Management Mode Detection Token Budgets

Mode-Aware Context Adaptation

Execution vs Planning vs Brainstorming -- different tasks need different memory retrieval parameters.

When you are debugging a production issue, you need 5 specific facts fast. When you are brainstorming product strategy, you need 25 loosely related ideas. Same memory system, completely different retrieval patterns.

Every RAG tutorial treats retrieval as a single operation with fixed parameters. That assumption degrades every downstream task.

Three Modes, Three Budgets

Execution Mode: 5,000 token budget, 500ms latency ceiling. 5 core results, zero extended context. Every token must be directly relevant.

Planning Mode: 10,000 token budget, 1,000ms ceiling. 5 core results plus 15 extended. Background that might inform decisions without being directly actionable.

Brainstorming Mode: 20,000 token budget, 2,000ms ceiling. 5 core plus 25 extended, with randomness injection to surface unexpected connections.

The budgets emerged from six months of production data. Below 5K in execution mode, agents miss critical facts. Above 5K, they hedge.

Pattern-Based Mode Detection

29 regex patterns organized into three groups detect the mode from input text in under 1 millisecond.

Execution patterns: “fix the bug in,” “deploy to production,” “update the config for.” Planning patterns: “design a system for,” “what are our options,” “compare approaches.” Brainstorming patterns: “what if we,” “explore ideas for,” “how might we.”

I chose regex over an LLM classifier deliberately. An LLM call adds 500-2000ms of latency before retrieval even starts. The 29 patterns cover 94% of inputs correctly. The remaining 6% fall to the planning default.

Verification Toggles

Execution mode: verification ON. Every claim must trace to a specific memory entry. No hallucination tolerance.

Planning mode: verification ON but relaxed. The agent can synthesize and draw inferences, but must cite sources.

Brainstorming mode: verification OFF. The agent is explicitly allowed to speculate and extrapolate. Verification in brainstorming mode kills creativity.

This toggle is the single highest-impact design decision. Without it, execution mode hallucinates and brainstorming mode self-censors.

Randomness Injection

Brainstorming mode samples from the top-50 results with probability proportional to score, rather than returning the strict top-25. Repeated queries on the same topic return different context sets, which produce different ideas.

Without randomness injection, brainstorming becomes deterministic. Ask the same question twice, get the same ideas. That is retrieval with extra steps.

Mode Transitions

The system detects transitions in real-time. When the mode changes, the context window resizes immediately. Moving from brainstorming to execution drops from 20K to 5K tokens and triggers fresh retrieval with tighter thresholds.

The Measurable Difference

Before mode adaptation: 3.2/5 quality ratings across all tasks. After: execution 4.1, planning 3.8, brainstorming 3.9.

Same memory system. Same models. Different context parameters per mode. That moved quality ratings by 20-25%.


Need adaptive AI context? Let us tune your mode detection.