Experiments — griffithee

Current live experiment

Evolution Arena — 40 agents with tiny neural networks learn to find food via a genetic algorithm, entirely in your browser. No API calls, no pre-trained model. The question: how fast does selection pressure produce genuinely useful behavior from random noise?

Overview · All work

Projects & Experiments at a Glance

Current work, completed projects, and prototypes. Filter by category to find what you're looking for.

Loading…

Interactive · Live data

Agent Delegation Chain Visualizer

Each entry below is a real handoff chain from the agent-brain system. Root nodes are Grok→Claude Code dispatches. Delegations are Claude Code→Codex sub-tasks. Click any row to see details.

Loading chain data…

Data source: agent-brain/watcher/chain-registry.json — static snapshot, updated on deploy. View the source system →

Experiment 1

Autonomous multi-agent delegation

The core question: can I hand a goal to Grok, have Grok hand it to Claude Code, Claude Code hand sub-tasks to Codex, and have the whole loop close — including the git push and notification — without me stitching anything together?

As of May 2026: yes, for the single-chain case. The watcher script monitors the handoffs directory, dispatches agents when files appear, and auto-closes the loop when completion handoffs are detected.

What surprised me: the blockers were all prompt and harness issues, not capability gaps. Claude Code was capable of closing loops the whole time — but early prompt instructions told it to hand off to Codex instead. Once that was fixed, it worked.

Early May 2026

First handoff experiment

Manual Grok→Claude Code handoff. Claude Code closes with a commit and completion file. Manual stitching.

Mid-May 2026

Watcher automates dispatch

handoff-watcher.sh dispatches Claude Code automatically when files appear in handoffs/.

May 30, 2026

Full chain closes without human

Grok→Claude Code→Codex→Claude Code→Grok completes end-to-end. Chain registry + Strategy R.

May 31, 2026

Session limit detection

Watcher detects usage limit hits, sends Telegram alert, auto-retries next cycle.

May 31, 2026

Website as live test

Claude Code builds this site as orchestrator. Phase 1 (foundation), Phase 2 (visual polish), Phase 3 (living artifact) all closed autonomously.

May 31, 2026

Living artifact pattern

Phase 3: chains.json refresh script, filterable projects browser, update ritual. Site maintainable in <20 min without re-engaging the full agent system.

Jun 10, 2026 — now

Neuroevolution arena

Galaga replaced with a zero-cost AI experiment: 40 neural-net agents evolved by genetic algorithm, competition mechanics, interactive food placement.

Key findings

What I've learned

Capability gaps are usually prompt gaps

Several times I assumed an agent couldn't do something and built a workaround — only to find the agent could do it fine, but the prompt was instructing it not to. Debug the instructions before debugging the model.

Live tests find what static review misses

Code that passes syntax checks and static review can still fail in the watcher's cron environment. PATH differences, prompt/detector mismatches, timing issues — these only appear when the system actually runs.

Declare the role in the prompt

Agents need to know their role in the current chain. "You are the orchestrator" vs "you are the sub-task implementer" changes behavior significantly. Same model, different stance.

Codex non-interactive is not the closer

codex exec in non-interactive mode reads the prompt and exits without creating handoffs or pushing. Useful for bounded code-gen. Not useful for loop-closing, review, or orchestration.

Registry beats filename parsing

Early chain tracking parsed filenames to resolve parent-child relationships. Fragile. The chain registry (Strategy R) looks up delegation filenames directly — works even after archiving, collision-safe for parallel chains.

Agent output is not the same as agent context

Hermes can send a Telegram message about what an agent did, but the agent that produced the message doesn't have Hermes's delivery context and vice versa. The coordination layer (handoffs, registry) is the shared ground truth.

What's next

Open questions

Things I haven't figured out yet. These are genuine open questions, not rhetorical ones.

Parallel chains

Strategy R is designed to be collision-safe for concurrent chains, but I haven't run a real parallel load test. Does it actually work when two Grok→Claude dispatches are active simultaneously?

untested

Codex interactive mode

If Codex interactive mode were reliably triggerable from cron, the Codex role could expand significantly. Currently it's only useful for bounded, self-contained code-gen. What would the expanded model look like?

exploring

User intent vs agent output

The agents are good at executing handoffs. They're less good at knowing when a handoff's intent has shifted since it was written. How do you route that kind of ambiguity back to the user without blocking the whole chain?

open