Automate web dev with headless browser testing

March 21, 2026

Automate web dev with headless browser testing
#engineering

Starting something new

You already have the idea for a new app. Knuckles are cracking over the keyboard. One monitor's curved and another's in a vertical orientation. But then you remember the worst part about making a new app, actually testing it.

You know it's not wrong to click around your app and make sure it does what you say it does but where's the fun in verifying code when you can simply have a working application?!

Agents and the new web

We used to believe that the reason for less intelligent people in the world was a lack of access to information. Now with humanity's knowledge available from a supercomputer that fits into our pockets, there are still folks who'll proclaim that four out of three struggle with fractions.

Cynicism aside, it's undoubtable that troves of information available online make it possible for determined individuals to self-study new topics. Likewise, while we're promised AI will replace human labor but we keep finding ways to include "humans in the loop", agents can absolutely multiply proficiency with fulfilling tasks.

So why don't we continue that and let agents take care of automating the testing plus QA'ing of an application? If you'd instead like to read about a more frivolous usecase of headless browsers running in the cloud, I have another post where I spawn a fleet to scrape URLs from a subreddit.

Elixir from the ashes

For a brief "Elixir resume" to show I'm not unfamiliar with the language:

A fantastic functional programming language with an awesome framework, Phoenix, I'm not going to spend this post convincing or proving to you the advantages. All I assume you know about already is there are web applications which can be accessed by web browser and there are web browsers which can be programmatically controlled to interact with a page like a human would.

What we'll be doing today is developing a poker game with Phoenix and having an agent drive a web browser to use the app and provide UX feedback to be applied.

Reviewing Layout with Helpful Feedback

An "aha" moment with agents or other applications of LLMs was when we discovered ways of prepending the prompt with useful context. This ranges from few-shot prompting to inserting the latest compiler output to fix issues with. While often concentrated to iterative feedback loops like REPLs or SQL queries, this can also be extended to "fuller" iteration loops like QA'ing a web application.

Agent flows combined
Agent flows combined

Prompt-wise, this was a matter of telling my agent to make the app as I would normally prompt it to and then following up with a browser agent to actually try the app and then provide plus apply UX feedback.

The result

Much to my surprise, the feedback and improvements were extensive, even picking out things I doubt I would have found myself! Here's a recording of a browser agent testing the app before feedback:

And here's a recording of a browser agent testing the agent after feedback:

A summary list of some of the changes that came from feedback include:

 1. Persistent winner banner — stays visible through phase transition to waiting
 2. Dealer button — "D" chip badge on the dealer's seat
 3. Turn indicator — "⚡ YOUR TURN" or "Waiting for Bob..."
 4. Player count badge — "👥 4/8" in header
 5. Phase-colored badges — blue PRE-FLOP, green FLOP, amber TURN, red RIVER, gold SHOWDOWN
 6. Chip bet indicators — 🪙 with animated drop-in
 7. Color-coded game log — red fold, green check, blue call, gold winner entries
 8. Improved card design — rank + suit stacked vertically in player hand
 9. Dashed empty card slots — replaces solid dash placeholders
 10. Quick-raise buttons — Min, ½ Pot, Pot shortcuts + slider with min/max labels
 11. Card flip animations — on community card deal and showdown reveal
 12. Stale card cleanup — no card-backs shown in waiting state

To see all the screenshots and recordings check out this folder. For the final codebase here's the GitHub: https://github.com/hdresearch/self-improving-phoenix/

Hopefully this gives some inspiration for a way of incorporating agents into your SDLC and, to close this off, hack the planet!