Current live experiment

Evolution Arena — 40 agents with tiny neural networks learn to find food via a genetic algorithm, entirely in your browser. No API calls, no pre-trained model. The question: how fast does selection pressure produce genuinely useful behavior from random noise?

Projects & Experiments at a Glance

Current work, completed projects, and prototypes. Filter by category to find what you're looking for.

Loading…

Agent Delegation Chain Visualizer

Each entry below is a real handoff chain from the agent-brain system. Root nodes are Grok→Claude Code dispatches. Delegations are Claude Code→Codex sub-tasks. Click any row to see details.

watcher/chain-registry.json · snapshot loading…
Loading chain data…

Data source: agent-brain/watcher/chain-registry.json — static snapshot, updated on deploy. View the source system →


Autonomous multi-agent delegation

The core question: can I hand a goal to Grok, have Grok hand it to Claude Code, Claude Code hand sub-tasks to Codex, and have the whole loop close — including the git push and notification — without me stitching anything together?

As of May 2026: yes, for the single-chain case. The watcher script monitors the handoffs directory, dispatches agents when files appear, and auto-closes the loop when completion handoffs are detected.

What surprised me: the blockers were all prompt and harness issues, not capability gaps. Claude Code was capable of closing loops the whole time — but early prompt instructions told it to hand off to Codex instead. Once that was fixed, it worked.

Early May 2026
First handoff experiment
Manual Grok→Claude Code handoff. Claude Code closes with a commit and completion file. Manual stitching.
Mid-May 2026
Watcher automates dispatch
handoff-watcher.sh dispatches Claude Code automatically when files appear in handoffs/.
May 30, 2026
Full chain closes without human
Grok→Claude Code→Codex→Claude Code→Grok completes end-to-end. Chain registry + Strategy R.
May 31, 2026
Session limit detection
Watcher detects usage limit hits, sends Telegram alert, auto-retries next cycle.
May 31, 2026
Website as live test
Claude Code builds this site as orchestrator. Phase 1 (foundation), Phase 2 (visual polish), Phase 3 (living artifact) all closed autonomously.
May 31, 2026
Living artifact pattern
Phase 3: chains.json refresh script, filterable projects browser, update ritual. Site maintainable in <20 min without re-engaging the full agent system.
Jun 10, 2026 — now
Neuroevolution arena
Galaga replaced with a zero-cost AI experiment: 40 neural-net agents evolved by genetic algorithm, competition mechanics, interactive food placement.

What I've learned

Capability gaps are usually prompt gaps

Several times I assumed an agent couldn't do something and built a workaround — only to find the agent could do it fine, but the prompt was instructing it not to. Debug the instructions before debugging the model.

Live tests find what static review misses

Code that passes syntax checks and static review can still fail in the watcher's cron environment. PATH differences, prompt/detector mismatches, timing issues — these only appear when the system actually runs.

Declare the role in the prompt

Agents need to know their role in the current chain. "You are the orchestrator" vs "you are the sub-task implementer" changes behavior significantly. Same model, different stance.

Codex non-interactive is not the closer

codex exec in non-interactive mode reads the prompt and exits without creating handoffs or pushing. Useful for bounded code-gen. Not useful for loop-closing, review, or orchestration.

Registry beats filename parsing

Early chain tracking parsed filenames to resolve parent-child relationships. Fragile. The chain registry (Strategy R) looks up delegation filenames directly — works even after archiving, collision-safe for parallel chains.

Agent output is not the same as agent context

Hermes can send a Telegram message about what an agent did, but the agent that produced the message doesn't have Hermes's delivery context and vice versa. The coordination layer (handoffs, registry) is the shared ground truth.


Open questions

Things I haven't figured out yet. These are genuine open questions, not rhetorical ones.

Parallel chains

Strategy R is designed to be collision-safe for concurrent chains, but I haven't run a real parallel load test. Does it actually work when two Grok→Claude dispatches are active simultaneously?

untested
Codex interactive mode

If Codex interactive mode were reliably triggerable from cron, the Codex role could expand significantly. Currently it's only useful for bounded, self-contained code-gen. What would the expanded model look like?

exploring
User intent vs agent output

The agents are good at executing handoffs. They're less good at knowing when a handoff's intent has shifted since it was written. How do you route that kind of ambiguity back to the user without blocking the whole chain?

open