Life OS: A Personal AI Dashboard

I have 207 AI agents doing things for me. Without a dashboard, I had no idea what any of them were doing, which ones were stuck, or whether they were duplicating work.

That sentence sounds absurd until you realize how it happens. You automate one thing. Then another. You connect them with a webhook. You add a cron job. Six months later, you have 207 processes running across multiple machines, and the only way to check on them is SSH-ing into servers and reading log files.

I needed a control plane. Something that showed me, at a glance, what every agent was doing, whether it was healthy, and how much it was costing me.

The Agent Sprawl Problem

Here’s how 207 agents happen. You start with one: a content pipeline that watches YouTube channels and generates blog post drafts. That’s maybe 5 agents (downloader, transcriber, synthesizer, editor, publisher).

Then you add an outreach pipeline. That’s another 8 agents. Then email management. Calendar scheduling. Code review automation. Trading system monitoring. Each pipeline has 3-15 agents depending on complexity.

The problem isn’t any individual agent. Each one works fine in isolation. The problem is coordination. Agent 47 generates a research summary. Agent 112 needs that summary to compose an email. Agent 47 fails silently on a Tuesday night. Agent 112 sends emails with stale data for three days before anyone notices.

Multiply that by 207 agents and you understand why I stopped sleeping through the night.

Architecture Decisions

The dashboard needed to solve three problems: visibility (what’s happening), health (what’s broken), and cost (what’s this costing me).

Agent Registry with RBAC

Every agent registers itself on startup. The registry stores the agent’s name, type, owning pipeline, current state, last heartbeat, permissions, and resource budget. RBAC controls which agents can invoke which other agents — a content pipeline agent has no business calling trading system endpoints.

The registry isn’t just a database table. It’s an active system. If an agent hasn’t sent a heartbeat in 5 minutes, the registry marks it degraded. After 15 minutes, it marks it dead and sends me a notification. This caught 23 silent failures in the first month.

WebSocket Real-Time Updates

The dashboard uses WebSocket connections for live state updates. When an agent changes state — started, processing, completed, failed — the update hits the dashboard within 200ms. I tried polling first. Polling 207 agents every 5 seconds is 2,484 requests per minute. WebSockets reduced that to event-driven pushes.

The WebSocket layer also handles backpressure. When 40 agents complete tasks in the same second (common during batch processing windows), the updates queue and deliver in order rather than overwhelming the frontend.

Calendar Scheduling with Cron

Most agents don’t run continuously. They run on schedules. The content pipeline fires at 6 AM. Outreach runs at 9 AM. Code review checks happen every 30 minutes during business hours. The dashboard shows a calendar view of scheduled runs, upcoming executions, and historical completion times.

The cron system supports dependencies. Agent B doesn’t fire at 9 AM — it fires after Agent A completes successfully. If Agent A fails, Agent B waits. If Agent A hasn’t completed by 9:30 AM, Agent B alerts and holds. This dependency-aware scheduling eliminated an entire class of stale-data bugs.

The CLI Bridge

I work in three AI models daily: Claude, Gemini, and Codex. Each has different strengths. Claude handles complex reasoning and code review. Gemini does broad research and synthesis. Codex handles rapid code generation.

The CLI Bridge sits between me and these models. When I issue a command, the bridge routes it to the appropriate model based on the task type. Research queries go to Gemini. Code modifications go to Claude. Quick one-off scripts go to Codex.

But routing is the simple part. The hard part is context continuity. When I start a task in Claude, switch to Gemini for research, then come back to Claude for implementation, the bridge maintains a shared context window. Claude sees what Gemini found. Gemini sees what Claude planned.

This shared context is stored in Memory MCP — a cross-session persistence layer that uses vector search, graph relationships, and Bayesian decay to surface relevant prior work. The CLI Bridge queries Memory MCP before every model invocation, injecting relevant context into the prompt.

Pipeline Designer

The Pipeline Designer is where I build new agent workflows. It’s a visual canvas built on ReactFlow — drag nodes, connect them with edges, configure each node’s parameters.

There are 13 node types:

Execution Nodes (2): Task nodes that run agent logic, and Transform nodes that reshape data between agents.

Quality Gate Nodes (11): These are the interesting ones. Each gate checks a different dimension of output quality before allowing data to flow downstream.

Slop Gate: Checks for banned phrases, cliche patterns, and AI-typical writing. Threshold: 30% slop score means the content gets recycled back to the author agent.
Fact Gate: Cross-references claims against source material. Unsupported claims get flagged.
Style Gate: Enforces voice consistency using a YAML style profile.
Cost Gate: Blocks execution if the projected token cost exceeds budget.
Latency Gate: Fails the pipeline if any node exceeds its time budget.
Dedup Gate: Checks if this output duplicates something already produced.
Security Gate: Scans outputs for PII, credentials, or sensitive data leakage.
Compliance Gate: Verifies outputs against regulatory requirements (relevant for the GuardSpine pipeline).
Format Gate: Validates output structure (JSON schema, markdown structure, etc).
Coherence Gate: Checks that the output logically follows from the input context.
Freshness Gate: Rejects outputs based on stale data (older than a configurable threshold).

When a gate fails, the pipeline doesn’t just stop. It routes the failed output back to the producing agent with a structured error message explaining what failed and why. The agent gets one retry. If it fails again, the pipeline halts and alerts me.

What the Dashboard Actually Shows

The main view is a grid of 207 agent cards. Each card shows the agent name, pipeline membership, current state (idle/running/failed/degraded), last execution time, success rate over 7 days, and cost over 7 days.

The cards are color-coded. Green means healthy. Yellow means degraded (missed heartbeats or elevated error rates). Red means failed. Gray means idle (not scheduled to run).

I can filter by pipeline, by state, by cost range, or by error rate. The most useful filter is “show me everything that failed in the last 24 hours.” On a good day, that’s 0-2 agents. On a bad day (API rate limits, service outages), it can be 30+.

Cost Tracking

Every API call made by every agent is logged with its token count and cost. The dashboard aggregates this by agent, by pipeline, and by model. I can see that my content pipeline costs $4.20/day, my outreach pipeline costs $1.80/day, and my code review pipeline costs $0.90/day.

This visibility killed wasteful patterns. I discovered that one agent was regenerating a summary on every invocation instead of caching it. That single fix saved $45/month.

Pipeline Health

Each pipeline has a health score based on its agents’ success rates, latency percentiles, and cost efficiency. A pipeline with 95% success rate and p99 latency under 30 seconds is healthy. A pipeline with 80% success rate is degraded. Below 70%, it’s marked critical.

The health score trends over time. I can see that Pipeline X was healthy for three weeks, then started degrading on March 1st. That usually correlates with an upstream API change or a model behavior shift.

Lessons from Managing 207 Agents

Agents need supervision, not autonomy. The AI discourse is obsessed with autonomous agents. In practice, unsupervised agents drift, duplicate work, and fail silently. The dashboard is supervision infrastructure. It lets me give agents freedom within boundaries.

Cost visibility changes behavior. Before the dashboard, I had no idea what my AI spend was. Turns out it was $380/month, and 30% of that was waste — retries on already-succeeded tasks, unnecessary model calls, and one agent that was literally talking to itself in a loop.

Heartbeats catch failures that error handling misses. Some failures don’t throw errors. The agent just… stops. No exception, no log entry, no alert. It hangs on a network call or enters an infinite wait. Heartbeat monitoring catches these zombie agents.

Quality gates are worth the latency cost. Adding 11 quality gate types means pipelines take longer to complete. The content pipeline went from 4 minutes to 7 minutes. But the output quality improved so much that I eliminated the manual review step entirely. Net time saved: about 2 hours per day.

What’s Next

The dashboard is at 75% completion. The agent registry, WebSocket updates, and calendar scheduling are done. The Pipeline Designer works but needs polish. Cost tracking is live. What’s missing is the Pipeline Designer’s gate configuration UI (currently I configure gates in YAML) and historical analytics.

The goal is to never SSH into a server to check on an agent again. I want every agent’s status, every pipeline’s health, and every dollar spent visible on one screen.

If you’re running more than 10 AI agents and managing them through log files, you already know this pain. The dashboard isn’t optional infrastructure — it’s the difference between running AI and being run by it.

Managing AI agent sprawl in your organization? I help teams build control planes for multi-agent systems. Book a call to talk about agent orchestration.