App UI Testing (any URL — no deploy)
OpenFactory can drive a managed desktop tester VM (a vanilla Ubuntu desktop with a browser) to test the user interface of any web app over MCP — clicking, typing, and visually verifying like a person would.
You point it at a URL. You do not deploy your app with OpenFactory to test it. The tester VM opens whatever URL you give it — a local dev server, a preview or production deployment on Vercel, AWS, Netlify, Render, Fly, or any host, or any public site — as long as that URL is reachable from the VM. Your app keeps running wherever it already runs.
Want OpenFactory to host the app too? You can also deploy a Git repo and get a public preview URL to test against — see App Deployment.
This is for GUI / UX workflow testing (login flows, forms, navigation, “does the new button actually work”). Lower-level checks (image contents, packages, systemd units) belong in build test suites and assertions.
How it works
- Get a tester VM —
ensure_tester_vmreturns your persistent, reusable desktop VM (created on first use, reused after). - Describe the flow once —
create_app_scenariostores a reusable scenario: an ordered list of plain-language steps against your app’s URL. - Run it (and re-run it) —
run_app_scenarioexecutes the scenario in the VM and records screenshots + a pass/fail verdict.
Reusable, self-hardening scenarios
A scenario is described in plain language, not pixel coordinates or CSS selectors:
[
{ "action": "open_url", "value": "${APP_URL}" },
{ "action": "type", "target": "email field", "value": "${EMAIL}" },
{ "action": "type", "target": "password field", "value": "${PASSWORD}" },
{ "action": "click", "target": "Sign In button" },
{ "action": "type", "target": "verification code", "value": "${totp:OTP_SECRET}" },
{ "action": "click", "target": "Verify button", "expect": "Dashboard" }
]Step actions: open_url, click, type, key, assert_text, wait. For
click / type, target is matched on screen by the visual model — so it keeps
working when markup changes.
Hardening (fast, resilient re-runs). The first run resolves each element with the visual model (slower) and remembers where it was. Later runs replay from that memory and skip the expensive full-screen analysis — re-resolving (and re-learning) only the steps whose UI actually moved or changed. So a suite gets faster on the second run and self-heals small UI changes instead of breaking.
Environment variables and secrets
Steps reference variables as ${VAR}. Put non-secret defaults (app URL, test
email) on the scenario; pass secrets (passwords, tokens) at run time — they
are used for substitution and are never stored or written into the recorded
run.
Two-factor authentication (2FA)
If signing in requires a TOTP code, use ${totp:VAR} where VAR holds the
account’s base32 TOTP secret (the same seed your authenticator app uses).
OpenFactory computes the current 6-digit code (RFC 6238) and types it. Provide
the seed as a run-time secret, never in the stored scenario.
Highlighting UI elements
annotate_screenshot draws labeled boxes on a screenshot — useful for a coding
agent to box the element it just built (e.g. “new: Submit button”) for review
or evidence. Coordinates are pixel-space, so a box drawn at an element’s reported
position lands exactly on it.
Ad-hoc recorded runs
If you’d rather drive the VM step by step yourself (instead of a stored
scenario), use start_app_test → record_app_test_step → finish_app_test.
Each run is saved with per-step screenshots and a standalone HTML report.
MCP tools
| Tool | Use |
|---|---|
ensure_tester_vm | Get or create your persistent desktop tester VM |
create_app_scenario | Save a reusable GUI test scenario for an app URL |
list_app_scenarios / get_app_scenario | Browse a scenario and its hardened cache |
run_app_scenario | Run a scenario (pass run-time secrets here) and record the result |
start_app_test / record_app_test_step / finish_app_test | Drive and record an ad-hoc run yourself |
list_app_test_runs / get_app_test_run | Review run history, screenshots, and reports |
annotate_screenshot | Draw labeled highlight boxes on a screenshot |
desktop_screenshot / desktop_click / desktop_type / … | Drive the VM directly |
Example prompts
Create a smoke-login scenario for my app at https://my-app.vercel.app: open it,
sign in with the email/password I'll provide at run time, handle the 2FA code,
and verify the dashboard loads. Then run it.Run the smoke-login scenario again and tell me which steps were served from
cache vs. re-resolved.Open https://staging.example.com in the tester VM, screenshot it, and box the
primary call-to-action button labeled "new CTA".