You already have the idea for a new app. Knuckles are cracking over the keyboard. One monitor's curved and another's in a vertical orientation. But then you remember the worst part about making a new app, actually testing it.
You know it's not wrong to click around your app and make sure it does what you say it does but where's the fun in verifying code when you can simply have a working application?!
Agents and the new web
We used to believe that the reason for less intelligent people in the world was a lack of access to information. Now with humanity's knowledge available from a supercomputer that fits into our pockets, there are still folks who'll proclaim that four out of three struggle with fractions.
Cynicism aside, it's undoubtable that troves of information available online make it possible for determined individuals to self-study new topics. Likewise, while we're promised AI will replace human labor but we keep finding ways to include "humans in the loop", agents can absolutely multiply proficiency with fulfilling tasks.
So why don't we continue that and let agents take care of automating the testing plus QA'ing of an application? If you'd instead like to read about a more frivolous usecase of headless browsers running in the cloud, I have another post where I spawn a fleet to scrape URLs from a subreddit.
Elixir from the ashes
For a brief "Elixir resume" to show I'm not unfamiliar with the language:
Years ago I made a real-time editor inspired by Darklang that provided a LISP for defining web UIs
A fantastic functional programming language with an awesome framework, Phoenix, I'm not going to spend this post convincing or proving to you the advantages. All I assume you know about already is there are web applications which can be accessed by web browser and there are web browsers which can be programmatically controlled to interact with a page like a human would.
What we'll be doing today is developing a poker game with Phoenix and having an agent drive a web browser to use the app and provide UX feedback to be applied.
Reviewing Layout with Helpful Feedback
An "aha" moment with agents or other applications of LLMs was when we discovered ways of prepending the prompt with useful context. This ranges from few-shot prompting to inserting the latest compiler output to fix issues with. While often concentrated to iterative feedback loops like REPLs or SQL queries, this can also be extended to "fuller" iteration loops like QA'ing a web application.
Agent flows combined
Prompt-wise, this was a matter of telling my agent to make the app as I would normally prompt it to and then following up with a browser agent to actually try the app and then provide plus apply UX feedback.
The result
Much to my surprise, the feedback and improvements were extensive, even picking out things I doubt I would have found myself! Here's a recording of a browser agent testing the app before feedback:
And here's a recording of a browser agent testing the agent after feedback:
A summary list of some of the changes that came from feedback include:
1. Persistent winner banner — stays visible through phase transition to waiting
2. Dealer button — "D" chip badge on the dealer's seat
3. Turn indicator — "⚡ YOUR TURN" or "Waiting for Bob..."
4. Player count badge — "👥 4/8" in header
5. Phase-colored badges — blue PRE-FLOP, green FLOP, amber TURN, red RIVER, gold SHOWDOWN
6. Chip bet indicators — 🪙 with animated drop-in
7. Color-coded game log — red fold, green check, blue call, gold winner entries
8. Improved card design — rank + suit stacked vertically in player hand
9. Dashed empty card slots — replaces solid dash placeholders
10. Quick-raise buttons — Min, ½ Pot, Pot shortcuts + slider with min/max labels
11. Card flip animations — on community card deal and showdown reveal
12. Stale card cleanup — no card-backs shown in waiting state
You already have the idea for a new app. Knuckles are cracking over the keyboard. One monitor's curved and another's in a vertical orientation. But then you remember the worst part about making a new app, actually testing it.
You know it's not wrong to click around your app and make sure it does what you say it does but where's the fun in verifying code when you can simply have a working application?!
Agents and the new web
We used to believe that the reason for less intelligent people in the world was a lack of access to information. Now with humanity's knowledge available from a supercomputer that fits into our pockets, there are still folks who'll proclaim that four out of three struggle with fractions.
Cynicism aside, it's undoubtable that troves of information available online make it possible for determined individuals to self-study new topics. Likewise, while we're promised AI will replace human labor but we keep finding ways to include "humans in the loop", agents can absolutely multiply proficiency with fulfilling tasks.
So why don't we continue that and let agents take care of automating the testing plus QA'ing of an application? If you'd instead like to read about a more frivolous usecase of headless browsers running in the cloud, I have another post where I spawn a fleet to scrape URLs from a subreddit.
Elixir from the ashes
For a brief "Elixir resume" to show I'm not unfamiliar with the language:
Years ago I made a real-time editor inspired by Darklang that provided a LISP for defining web UIs
A fantastic functional programming language with an awesome framework, Phoenix, I'm not going to spend this post convincing or proving to you the advantages. All I assume you know about already is there are web applications which can be accessed by web browser and there are web browsers which can be programmatically controlled to interact with a page like a human would.
What we'll be doing today is developing a poker game with Phoenix and having an agent drive a web browser to use the app and provide UX feedback to be applied.
Reviewing Layout with Helpful Feedback
An "aha" moment with agents or other applications of LLMs was when we discovered ways of prepending the prompt with useful context. This ranges from few-shot prompting to inserting the latest compiler output to fix issues with. While often concentrated to iterative feedback loops like REPLs or SQL queries, this can also be extended to "fuller" iteration loops like QA'ing a web application.
Agent flows combined
Prompt-wise, this was a matter of telling my agent to make the app as I would normally prompt it to and then following up with a browser agent to actually try the app and then provide plus apply UX feedback.
The result
Much to my surprise, the feedback and improvements were extensive, even picking out things I doubt I would have found myself! Here's a recording of a browser agent testing the app before feedback:
And here's a recording of a browser agent testing the agent after feedback:
A summary list of some of the changes that came from feedback include:
1. Persistent winner banner — stays visible through phase transition to waiting
2. Dealer button — "D" chip badge on the dealer's seat
3. Turn indicator — "⚡ YOUR TURN" or "Waiting for Bob..."
4. Player count badge — "👥 4/8" in header
5. Phase-colored badges — blue PRE-FLOP, green FLOP, amber TURN, red RIVER, gold SHOWDOWN
6. Chip bet indicators — 🪙 with animated drop-in
7. Color-coded game log — red fold, green check, blue call, gold winner entries
8. Improved card design — rank + suit stacked vertically in player hand
9. Dashed empty card slots — replaces solid dash placeholders
10. Quick-raise buttons — Min, ½ Pot, Pot shortcuts + slider with min/max labels
11. Card flip animations — on community card deal and showdown reveal
12. Stale card cleanup — no card-backs shown in waiting state