I spent a billion tokens and all I got was this repo
March 2, 2026
#engineering
Overview
You know that feeling when you leave your coding agent running overnight and you wake up a massive bill?
Token usage screenshot
As you can see, I used a total of zero web searches since I, much like Richard Stallman, strongly distrust the internet. You might also notice a big spike out of nowhere at the end of the month; that's firebird.
If you'd like to learn about what Elixir and WebAssembly are or why even bring the two together, feel free to check out this post where I go into more detail! Otherwise, stick around if you'd instead like to learn how I corralled coding agents to make that chart happen.
Tokens go brrrr
By being here, you're either interested in the necessary safe guards to prevent your billing from looking terrifying or, alternatively, you're interested in how to blast an effective "code cannon" at some given problem. Being the more interesting one, I'll assume the latter and that you really want to crunch billions of tokens programming.
Before jumping into the deep end of the pool without prior knowledge, which is an option if that's what you'd like, it may be helpful to observe a particular evolution of technologies:
Following that, a flurry of tools for enrichening the capabilities of models rolled out from MCP to skills, all tackling the question of enabling an agent to 'do more'
Now that a coding agent can do meaningful things, we needed to validate complex changes as well as perhaps formally verify them
Where we are today is everyone and their grandmother has some opinionated agent harness that can do smart home automation or follows some scripted subagents. But, rather than shove in some new techno-gizmo and proclaim a discovery out of it, let's instead reframe the evolution of technologies in terms of what it'd look like if a person were leveling up in terms of those capacities:
First, a person is communicated with in order to see whether or not their output could be depended on for the job
Next, a person is given access to a work email, company accounts, and so forth
Following being granted access to tools, the company may have internal plugins or integrations which alleviate manual aspects of day-to-day work
Now this person can do meaningful things and contribute, they may then work under a mentor or as part of a team with existing practices
Narrowing in on software engineering, it can be observed that all "advancements" in the space of coding agents is really just codification of the Software Development Life Cycle. Similar to how codification of mathematical proofs is broadly good, it too is good that the patterns and practices that exist in engineering orgs is finally being translated over to the next generation of development. But, can we please not treat it like discovering Jesus' lost toenail.
Basic SDLC
I get it, I used to loathe the concept of an acronym that encompasses all the important and religious part of software development. But then I couldn't help but notice the broad label helps generalize the different approaches taken by different engineering orgs.
There's the central code repository, it used to be done with subversion then some Finnish guy gave us a better tool. That's been mostly it with the exception of new UIs such as GitHub or GitLab (yes, I'm excluding cool stuff like jj since industry standards).
In other words, if you were setting up a new codebase with the intent to collaborate on it with other people, you'd likely set up formatters to prevent ugly merge conflicts, opinionated linters to prevent repetitive PR conversations, and so forth. A 'real' way of describing the SDLC would be it shaves away reasons for concern before clicking "approve" on a PR.
While a detail about something preventing code from simply being shipped sounds contrary to our goal, it's actually the very crux of how we'll go about blasting out code. Like how it can be said work is already "remote work" since people have to coordinate with others across floors, any coding work involving multiple people is going to have its own form of SDLC whether that's the one guy who reviews every PR or that's having enough green dots for people to comment "LGTM".
With appropriate guardrails and tests, you can rest at piece regardless if there's one or a thousand coding agents provided the only changes being checked in meet requirements.
Loops
Avoiding the philosophy debate of what an AI is intrinsically motivated by, it's obvious that any AI processes need to be perpetually acting in direction of some goal. For humans, that's food and water. For AI, that can be accomplished with something as simple as:
while true; do
pi -p "$PROMPT"
done
That's it; that's all you need to let a coding agent like pi to run perpetaully based on some prompt. You'll notice I'm referencing a $PROMPT variable since I'd like the agent to actually follow a precise prompt over and over. This could equivalently be done with AGENTS.md or sub-agents but sometimes you don't want to be checking in all the managerial stuff especially if what you want to walk away with is a codebase rather than an org breakdown.
Sorry, I lied, there's one thing missing:
while true; do
pi -p "$PROMPT"
pi -p "Don't push to master, check out a new branch and iterate on that PR"
done
The first and foremost PROMPT string should contain "taking active PRs across the finish line" as an explicit first priority over further feature work and the follow-on prompt helps whether the task was one-shotted or is something being iteratively worked on.
Now you have the minimal productive agent that could begin to act similarly to a recently onboarded human engineer. Much like you'd have a dev recently onboarded to a GitHub organization and can start pushing, that's what we have here!
Organizing
Like with political goals, an effective engineering pursuit boils down to its organization. To actually kick off this project entailed two files: an env.sh containing API key exports, and a plan.md file outlining the rough plan.
Further below are what I originally used to spin up the project and run overnight. Every other script or resource was provisioned entirely by a coding agent.
At the root of the organizing principle behind running agents inside the environments they're operating in is simply the breakdown of a REPL.
Read → Eval → Print → Loop
↑ ↓
└───────────────┘
Between Loop and Read is a massive source of latency if you have to always "accept edits" or explicitly accept each individual tool call. Likewise, the network latency between an agent controlling an environment (ie CDP or SSH) can quickly add up.
This was something I found to be true empirically when recreating the ChatGPT Agent product consisting of computer use agents as well as agent-controlled web browsers, results were much more reliable when the agents were inside the environments they were intended to be operating. Navy SEALs are trained to be effective once dropped into an environment, not to be remote drone operators.
Starting up
Below are the only two files I used to establish the coding patterns and blast the original billion-ish tokens at the problem.
env.sh
The three environment variables I wanted consistent.
export VERS_API_KEY='...'
export ANTHROPIC_API_KEY='...'
export GITHUB_API_KEY='<scoped personal access token>'
plan.md
And the 15 lines that laid the groundwork for the code cannon.
The goal is to make a modern elixir/webassembly solution for both running wasm-in-elixir as well as building elixir-to-wasm
ALL SYSTEMS AND AGENTS MUST use this github -> https://github.com/hdresearch/firebird.git
For each of the below goals, create a VM and run code like the following
```bash
while true do
pi -run "GOAL"
end
```
NOTE - pi is running on the VM itself rather than running on the host machine and then ssh'ing commands. This should be done so I can quit this pi session
So agents are just infinitely running since there is always something to improve in a piece of software. Include pi-vers extension and copy over env.sh so each infinite loop can provision further VMs or agents.
- elixir developer experience (making sure usage is as simple as possible, should not be five minutes of updating an existing elixir project to get it to work)
- running wasm from elixir (ie build go wasm project and run an exported add() function)
- enabling a mix target to compile an elixir project to wasm
- getting a phoenix project to compile to wasm
Be sure to instruct each goal loop to include 100% test coverage as well as performance comparisons between elixir programs and the webassembly equivalents (such rewriting a math function in rust->wasm to then use or compiling an elixir app to wasm to benchmark the number of handleable processes). There may be cases where BEAM wins but my suspicion is there is a case where elixir-to-wasm can be more performant or memory efficient than using standard elixir
When I had to step in
From the original two files, it spawned multiple sandboxes with pi instances running in them and each one had a fuller shell script than the while true do example shared. It actually found the correct CLI arg to use was pi -p "PROMPT" instead of -run.
For cases where I wanted to clarify the scope (such as reprioritizing active PRs over feature work) or introduce new workers (such as cleaning up failing PR checks or extending benchmarking), I simply spun up pi locally again and each instruction was a prompt away. Much like being able to send a Slack DM to your "10x engineer" and knowing you can follow up with further messages if the original project description doesn't encapsulate all the relevant details.
What I found to sometimes be done by default but harmless to reiterate when prompting pi in these "follow on step in's", is to specify that pi should only stop when it can tell you that it's safe to quit that active pi session (since the implication is that there'd be a headless coding agent running on a separate machine).
This vibe coding workflow could be seen as more similar to vim than emacs in that I was opening and exiting pi repeatedly instead of operating the entire work from a single CLI session. The best feeling was walking out to grab a coffee and seeing new commits are being done from my GitHub mobile app.
I spent a billion tokens and all I got was this repo
March 2, 2026
#engineering
Overview
You know that feeling when you leave your coding agent running overnight and you wake up a massive bill?
Token usage screenshot
As you can see, I used a total of zero web searches since I, much like Richard Stallman, strongly distrust the internet. You might also notice a big spike out of nowhere at the end of the month; that's firebird.
If you'd like to learn about what Elixir and WebAssembly are or why even bring the two together, feel free to check out this post where I go into more detail! Otherwise, stick around if you'd instead like to learn how I corralled coding agents to make that chart happen.
Tokens go brrrr
By being here, you're either interested in the necessary safe guards to prevent your billing from looking terrifying or, alternatively, you're interested in how to blast an effective "code cannon" at some given problem. Being the more interesting one, I'll assume the latter and that you really want to crunch billions of tokens programming.
Before jumping into the deep end of the pool without prior knowledge, which is an option if that's what you'd like, it may be helpful to observe a particular evolution of technologies:
Following that, a flurry of tools for enrichening the capabilities of models rolled out from MCP to skills, all tackling the question of enabling an agent to 'do more'
Now that a coding agent can do meaningful things, we needed to validate complex changes as well as perhaps formally verify them
Where we are today is everyone and their grandmother has some opinionated agent harness that can do smart home automation or follows some scripted subagents. But, rather than shove in some new techno-gizmo and proclaim a discovery out of it, let's instead reframe the evolution of technologies in terms of what it'd look like if a person were leveling up in terms of those capacities:
First, a person is communicated with in order to see whether or not their output could be depended on for the job
Next, a person is given access to a work email, company accounts, and so forth
Following being granted access to tools, the company may have internal plugins or integrations which alleviate manual aspects of day-to-day work
Now this person can do meaningful things and contribute, they may then work under a mentor or as part of a team with existing practices
Narrowing in on software engineering, it can be observed that all "advancements" in the space of coding agents is really just codification of the Software Development Life Cycle. Similar to how codification of mathematical proofs is broadly good, it too is good that the patterns and practices that exist in engineering orgs is finally being translated over to the next generation of development. But, can we please not treat it like discovering Jesus' lost toenail.
Basic SDLC
I get it, I used to loathe the concept of an acronym that encompasses all the important and religious part of software development. But then I couldn't help but notice the broad label helps generalize the different approaches taken by different engineering orgs.
There's the central code repository, it used to be done with subversion then some Finnish guy gave us a better tool. That's been mostly it with the exception of new UIs such as GitHub or GitLab (yes, I'm excluding cool stuff like jj since industry standards).
In other words, if you were setting up a new codebase with the intent to collaborate on it with other people, you'd likely set up formatters to prevent ugly merge conflicts, opinionated linters to prevent repetitive PR conversations, and so forth. A 'real' way of describing the SDLC would be it shaves away reasons for concern before clicking "approve" on a PR.
While a detail about something preventing code from simply being shipped sounds contrary to our goal, it's actually the very crux of how we'll go about blasting out code. Like how it can be said work is already "remote work" since people have to coordinate with others across floors, any coding work involving multiple people is going to have its own form of SDLC whether that's the one guy who reviews every PR or that's having enough green dots for people to comment "LGTM".
With appropriate guardrails and tests, you can rest at piece regardless if there's one or a thousand coding agents provided the only changes being checked in meet requirements.
Loops
Avoiding the philosophy debate of what an AI is intrinsically motivated by, it's obvious that any AI processes need to be perpetually acting in direction of some goal. For humans, that's food and water. For AI, that can be accomplished with something as simple as:
while true; do
pi -p "$PROMPT"
done
That's it; that's all you need to let a coding agent like pi to run perpetaully based on some prompt. You'll notice I'm referencing a $PROMPT variable since I'd like the agent to actually follow a precise prompt over and over. This could equivalently be done with AGENTS.md or sub-agents but sometimes you don't want to be checking in all the managerial stuff especially if what you want to walk away with is a codebase rather than an org breakdown.
Sorry, I lied, there's one thing missing:
while true; do
pi -p "$PROMPT"
pi -p "Don't push to master, check out a new branch and iterate on that PR"
done
The first and foremost PROMPT string should contain "taking active PRs across the finish line" as an explicit first priority over further feature work and the follow-on prompt helps whether the task was one-shotted or is something being iteratively worked on.
Now you have the minimal productive agent that could begin to act similarly to a recently onboarded human engineer. Much like you'd have a dev recently onboarded to a GitHub organization and can start pushing, that's what we have here!
Organizing
Like with political goals, an effective engineering pursuit boils down to its organization. To actually kick off this project entailed two files: an env.sh containing API key exports, and a plan.md file outlining the rough plan.
Further below are what I originally used to spin up the project and run overnight. Every other script or resource was provisioned entirely by a coding agent.
At the root of the organizing principle behind running agents inside the environments they're operating in is simply the breakdown of a REPL.
Read → Eval → Print → Loop
↑ ↓
└───────────────┘
Between Loop and Read is a massive source of latency if you have to always "accept edits" or explicitly accept each individual tool call. Likewise, the network latency between an agent controlling an environment (ie CDP or SSH) can quickly add up.
This was something I found to be true empirically when recreating the ChatGPT Agent product consisting of computer use agents as well as agent-controlled web browsers, results were much more reliable when the agents were inside the environments they were intended to be operating. Navy SEALs are trained to be effective once dropped into an environment, not to be remote drone operators.
Starting up
Below are the only two files I used to establish the coding patterns and blast the original billion-ish tokens at the problem.
env.sh
The three environment variables I wanted consistent.
export VERS_API_KEY='...'
export ANTHROPIC_API_KEY='...'
export GITHUB_API_KEY='<scoped personal access token>'
plan.md
And the 15 lines that laid the groundwork for the code cannon.
The goal is to make a modern elixir/webassembly solution for both running wasm-in-elixir as well as building elixir-to-wasm
ALL SYSTEMS AND AGENTS MUST use this github -> https://github.com/hdresearch/firebird.git
For each of the below goals, create a VM and run code like the following
```bash
while true do
pi -run "GOAL"
end
```
NOTE - pi is running on the VM itself rather than running on the host machine and then ssh'ing commands. This should be done so I can quit this pi session
So agents are just infinitely running since there is always something to improve in a piece of software. Include pi-vers extension and copy over env.sh so each infinite loop can provision further VMs or agents.
- elixir developer experience (making sure usage is as simple as possible, should not be five minutes of updating an existing elixir project to get it to work)
- running wasm from elixir (ie build go wasm project and run an exported add() function)
- enabling a mix target to compile an elixir project to wasm
- getting a phoenix project to compile to wasm
Be sure to instruct each goal loop to include 100% test coverage as well as performance comparisons between elixir programs and the webassembly equivalents (such rewriting a math function in rust->wasm to then use or compiling an elixir app to wasm to benchmark the number of handleable processes). There may be cases where BEAM wins but my suspicion is there is a case where elixir-to-wasm can be more performant or memory efficient than using standard elixir
When I had to step in
From the original two files, it spawned multiple sandboxes with pi instances running in them and each one had a fuller shell script than the while true do example shared. It actually found the correct CLI arg to use was pi -p "PROMPT" instead of -run.
For cases where I wanted to clarify the scope (such as reprioritizing active PRs over feature work) or introduce new workers (such as cleaning up failing PR checks or extending benchmarking), I simply spun up pi locally again and each instruction was a prompt away. Much like being able to send a Slack DM to your "10x engineer" and knowing you can follow up with further messages if the original project description doesn't encapsulate all the relevant details.
What I found to sometimes be done by default but harmless to reiterate when prompting pi in these "follow on step in's", is to specify that pi should only stop when it can tell you that it's safe to quit that active pi session (since the implication is that there'd be a headless coding agent running on a separate machine).
This vibe coding workflow could be seen as more similar to vim than emacs in that I was opening and exiting pi repeatedly instead of operating the entire work from a single CLI session. The best feeling was walking out to grab a coffee and seeing new commits are being done from my GitHub mobile app.