Several times I assumed an agent couldn't do something and built a workaround — only to find the agent could do it fine, but the prompt was instructing it not to. Debug the instructions before debugging the model.
What I'm figuring out
Working notes on the autonomy experiments. What's working, what isn't, and what surprised me. The chain visualizer below shows real delegation data from the agent system.
Evolution Arena — 40 agents with tiny neural networks learn to find food via a genetic algorithm, entirely in your browser. No API calls, no pre-trained model. The question: how fast does selection pressure produce genuinely useful behavior from random noise?
Projects & Experiments at a Glance
Current work, completed projects, and prototypes. Filter by category to find what you're looking for.
Agent Delegation Chain Visualizer
Each entry below is a real handoff chain from the agent-brain system. Root nodes are Grok→Claude Code dispatches. Delegations are Claude Code→Codex sub-tasks. Click any row to see details.
Data source: agent-brain/watcher/chain-registry.json — static snapshot, updated on deploy. View the source system →
Autonomous multi-agent delegation
The core question: can I hand a goal to Grok, have Grok hand it to Claude Code, Claude Code hand sub-tasks to Codex, and have the whole loop close — including the git push and notification — without me stitching anything together?
As of May 2026: yes, for the single-chain case. The watcher script monitors the handoffs directory, dispatches agents when files appear, and auto-closes the loop when completion handoffs are detected.
What surprised me: the blockers were all prompt and harness issues, not capability gaps. Claude Code was capable of closing loops the whole time — but early prompt instructions told it to hand off to Codex instead. Once that was fixed, it worked.
What I've learned
Code that passes syntax checks and static review can still fail in the watcher's cron environment. PATH differences, prompt/detector mismatches, timing issues — these only appear when the system actually runs.
Agents need to know their role in the current chain. "You are the orchestrator" vs "you are the sub-task implementer" changes behavior significantly. Same model, different stance.
codex exec in non-interactive mode reads the prompt and exits
without creating handoffs or pushing. Useful for bounded code-gen. Not useful
for loop-closing, review, or orchestration.
Early chain tracking parsed filenames to resolve parent-child relationships. Fragile. The chain registry (Strategy R) looks up delegation filenames directly — works even after archiving, collision-safe for parallel chains.
Hermes can send a Telegram message about what an agent did, but the agent that produced the message doesn't have Hermes's delivery context and vice versa. The coordination layer (handoffs, registry) is the shared ground truth.
Open questions
Things I haven't figured out yet. These are genuine open questions, not rhetorical ones.
Strategy R is designed to be collision-safe for concurrent chains, but I haven't run a real parallel load test. Does it actually work when two Grok→Claude dispatches are active simultaneously?
untestedIf Codex interactive mode were reliably triggerable from cron, the Codex role could expand significantly. Currently it's only useful for bounded, self-contained code-gen. What would the expanded model look like?
exploringThe agents are good at executing handoffs. They're less good at knowing when a handoff's intent has shifted since it was written. How do you route that kind of ambiguity back to the user without blocking the whole chain?
open