Autonomous App Walker — find bugs the prompt never thought to test
Point the walker at any app URL on a tester VM and it explores the UI for you: breadth-first across reachable routes, clicking real DOM elements via the Chromium DevTools Protocol, recording console errors and failed network requests, and emitting runnable scenarios for every blocker or major finding it traps. Use it on a freshly vibecoded app, a staging build, or a production URL to surface the failure modes a hand-written test plan would miss.
What this is for. Catching the bugs nobody scripted a test for — dead buttons, console errors on a route the prompt never named, broken redirects, a
TypeErrortwo clicks deep. For a curated login or checkout flow you already know about, write a scenario instead.
Today (M1 through M5 are live, end-to-end real). DevTools primitives talk
to live CDP at :9222 on tester VMs, the BFS explorer drives the browser via
the VM Interaction API, walks and findings persist under
/TEST_STORAGE_DIR/app-walks/, the REST surface is authenticated, and the
Walks tab in the Apps UI renders against real walk records. There is no
stub mode — every walk you start runs against a real browser.
How it works
- Pick a tester VM and a target URL —
ensure_tester_vmgives you a persistent desktop VM with Chromium running under CDP. Any reachable URL works (deployed app, preview, local dev server). - Start a walk —
walk_appreturns awalk_idand adashboard_urlimmediately. The explorer runs in the background. - The explorer crawls the app — at each state it snapshots the page,
enumerates real clickable elements via CDP, collects browser diagnostics,
filters destructive actions, and pushes unvisited children into the BFS
frontier (bounded by
max_statesandmax_depth). - Findings get deduped and ranked — console errors, failed requests,
dead controls, and missing assertions are persisted with a
sha256dedup key over(category, route_template, selector, message[:80])and taggedblocker | major | minor | info. - Scenarios get emitted — for each blocker/major finding the walker
traces back through
parent_state_idand synthesizes aScenarioStepsequence (open_url→ clicks →assert_no_error→assert_text) and POSTs it to/api/tests/app-scenarios, ready to replay. - Review the report —
get_app_walkreturns the full record; the server also renders a standalonereport.htmlwith a Mermaid state graph, severity-grouped findings, and the emitted scenarios.
Walker modes
mechanical— pure BFS over enumerated DOM. Deterministic, exhaustive within bounds, no LLM cost.ai— at each state-seed the LLM ranks candidate actions againstai_goal(“Find broken buttons and flows”); the explorer follows the ranked indices first.hybrid— LLM ranks the top candidates per state, mechanical BFS sweeps the remainder. Default for app-shaped targets.
Destructive actions
By default the walker refuses to click anything that looks destructive
(Delete, Remove, Clear, Unsubscribe, Pay, etc). Opt back in with
destructive_allowed=True and a destructive_label_allowlist of literal
labels or re: regex patterns. Anything not on the allowlist stays
blocked even when destructive actions are otherwise permitted.
MCP tools
walk_app
Start an autonomous walk over a target URL on a tester VM with configurable BFS bounds, mode, and destructive-action filters.
walk_app(app_name='walker-canary', app_url='https://example.com/app',
vm_name='of-tester-xyz', max_states=20, max_depth=4,
mode='hybrid', ai_goal='Find broken buttons and flows',
destructive_allowed=True,
destructive_label_allowlist=['Remove from cart', 're:^Clear '])
→ {
"walk_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "running",
"vm_name": "of-tester-xyz",
"app_url": "https://example.com/app",
"config": {"mode": "hybrid", "max_states": 20, "max_depth": 4,
"destructive_allowed": true},
"dashboard_url": "http://libvirt-backend/api/tests/app-walks/550e8400-...",
"report_url": "http://libvirt-backend/api/tests/app-walks/550e8400-.../report.html",
"started_at": "2026-06-26T10:15:23.456Z"
}get_app_walk
Fetch the full walk record: status, state graph, transitions, findings with evidence, and emitted scenario IDs.
get_app_walk(walk_id='550e8400-e29b-41d4-a716-446655440000',
include_states=True, include_transitions=True, include_findings=True)
→ {
"walk_id": "550e8400-...",
"status": "completed",
"app_name": "walker-canary",
"totals": {"states": 8, "transitions": 12, "scenarios_emitted": 3,
"findings_by_severity": {"blocker": 1, "major": 2,
"minor": 5, "info": 3}},
"states": [{"state_id": "a1b2c3d4...", "url": "...", "title": "Dashboard"}],
"findings": [{"finding_id": "f123", "category": "console_error",
"severity": "blocker",
"message": "TypeError: Cannot read property 'map'",
"state_id": "a1b2c3d4..."}]
}list_app_walks
List recent walks filtered by app name, project, or status (running,
completed, stopped, error).
list_app_walks(app_name='walker-canary', status='completed', limit=20)
→ {
"walks": [{
"walk_id": "550e8400-...", "app_name": "walker-canary",
"status": "completed", "started_at": "2026-06-26T10:15:23Z",
"totals": {"states": 8, "transitions": 12, "scenarios_emitted": 3,
"findings_by_severity": {"blocker": 1, "major": 2,
"minor": 5, "info": 3}}
}],
"count": 1
}stop_app_walk
Request cooperative cancellation of a running walk; the explorer terminates within one poll cycle.
stop_app_walk(walk_id='550e8400-e29b-41d4-a716-446655440000')
→ { "walk_id": "550e8400-...", "status": "stopped",
"cancel_requested": true, "finished_at": "2026-06-26T10:22:15Z" }enumerate_dom_links
List every clickable or typable DOM element on the active browser tab via CDP, with selector, accessible name, href, bbox, and state flags. Useful for debugging what the walker actually sees on a given page.
enumerate_dom_links(vm_name='of-tester-xyz',
tab_filter='https://example.com',
include_perf=True)
→ [
{"type": "text", "elements": [
{"idx": 0, "selector": "button.submit", "tag": "button",
"role": "button", "accessible_name": "Submit",
"bbox": [100, 200, 200, 230],
"in_viewport": true, "visible": true, "disabled": false}
]},
{"type": "text", "text": "1 interactive element",
"perf": {"screenshot_ms": 120, "cdp_enumerate_ms": 180}}
]get_browser_diagnostics
Capture browser console errors, failed network requests, page violations,
and performance metrics via CDP. Snapshot mode by default; pass
collect_duration_ms for live collection.
get_browser_diagnostics(vm_name='of-tester-xyz',
collect_duration_ms=1500,
tab_filter='https://example.com')
→ [
{"type": "text", "text": "Browser diagnostics for of-tester-xyz"},
{"type": "text", "text": "Snapshot: tab=https://example.com, collected 0ms"},
{"type": "text", "text": "Console errors: 2\n - TypeError: Cannot read property 'map' of undefined (at app.js:145:12)\n - ReferenceError: globalThis is not defined"},
{"type": "text", "text": "Failed requests: 1\n - GET /api/user/profile (404 Not Found)"}
]get_pending_checkpoint
Long-poll (0 to 25 s) for the oldest unresolved harness checkpoint visible to the caller. Used by external harnesses to gate mining-time scenario verification or replay-time step verification.
get_pending_checkpoint(walk_id='550e8400-e29b-41d4-a716-446655440000',
wait_seconds=25)
→ {
"checkpoint": {
"checkpoint_id": "cp-abc123def456",
"walk_id": "550e8400-...",
"kind": "mine",
"status": "pending",
"prompt": "Verify proposed scenario captures the bug",
"context": {
"scenario_steps": [
{"action_type": "open_url", "value": "https://example.com"},
{"action_type": "click", "target_label": "Submit"}
],
"finding": {"finding_id": "f123", "category": "console_error",
"severity": "blocker",
"message": "TypeError: Cannot read property 'map'"}
},
"screenshot_url": "http://libvirt-backend/api/tests/app-walks/550e8400-.../states/a1b2c3d4.../screenshot"
},
"pending_count": 1
}submit_scenario_checkpoint
Submit a harness verdict (pass, fail, investigate, skip, timeout)
on a paused mining or replay checkpoint to resume the walker or runner.
submit_scenario_checkpoint(checkpoint_id='cp-abc123def456', verdict='pass',
notes='Scenario correctly reproduced the TypeError')
→ { "checkpoint_id": "cp-abc123def456", "status": "resolved",
"verdict": "pass",
"verdict_notes": "Scenario correctly reproduced the TypeError",
"resolved_at": "2026-06-26T10:25:10Z" }get_checkpoint
Fetch a single checkpoint record (pending or resolved) by ID to read its verdict and audit trail.
get_checkpoint(checkpoint_id='cp-abc123def456')
→ { "checkpoint_id": "cp-abc123def456",
"walk_id": "550e8400-...", "kind": "mine",
"status": "resolved", "verdict": "pass",
"created_at": "2026-06-26T10:20:00Z",
"resolved_at": "2026-06-26T10:25:10Z" }REST surface
The MCP tools are thin wrappers over an authenticated REST API. All routes
require Authorization: Bearer <token> plus X-User-Id (or X-User-Email
for loopback). Callers see only their own walks unless they are admin.
| Method | Path | Purpose |
|---|---|---|
POST | /api/tests/app-walks | Create a new walk record |
GET | /api/tests/app-walks | List walks (app_name, project_id, status, limit) |
GET | /api/tests/app-walks/{walk_id} | Fetch full record (include_states, include_transitions, include_findings) |
PATCH | /api/tests/app-walks/{walk_id} | Update status, totals, error, ai_state, emitted scenario IDs |
POST | /api/tests/app-walks/{walk_id}/stop | Request cooperative cancellation |
POST | /api/tests/app-walks/{walk_id}/login | Run an ephemeral login scenario as auth preamble |
POST | /api/tests/app-walks/{walk_id}/states | Append an ExplorationState (BFS node) |
POST | /api/tests/app-walks/{walk_id}/transitions | Append an ExplorationTransition (BFS edge) |
POST | /api/tests/app-walks/{walk_id}/findings | Append a Finding (deduped by sha256) |
POST | /api/tests/app-walks/{walk_id}/screenshots/{state_id} | Upload PNG for a state |
GET | /api/tests/app-walks/{walk_id}/states/{state_id}/screenshot | Retrieve PNG for a state |
GET | /api/tests/app-walks/{walk_id}/report.html | Server-rendered standalone HTML report |
POST | /api/tests/checkpoints | Walker/runner creates a mining (kind=mine) or replay (kind=verify) checkpoint |
GET | /api/tests/checkpoints | Long-poll oldest pending checkpoint scoped to walk_id or run_id |
GET | /api/tests/checkpoints/{checkpoint_id} | Fetch a single checkpoint record |
POST | /api/tests/checkpoints/{checkpoint_id}/verdict | Harness submits verdict, sets status=resolved, notifies waiter |
Putting it together
- Operator calls
walk_app(app_name='myapp', app_url='https://myapp.dev', vm_name='of-tester-1', mode='hybrid', ai_goal='Find broken buttons'). MCP returnswalk_id,status='running', anddashboard_urlimmediately. - The background explorer opens the URL in the VM’s browser, snapshots the page, calls
enumerate_dom_linksvia CDP to read the true clickable elements, filters out destructive actions, and pushes the root state into the BFS frontier. - For each queued state the explorer calls
get_browser_diagnosticsto collect any errors or failed requests, takes a screenshot, persists the state viaPOST /api/tests/app-walks/{walk_id}/states, and POSTs every click action as a transition. - In hybrid or AI mode, on state-seeds the explorer ranks candidates by calling the LLM with the candidate brief plus the current screenshot, then prioritizes the LLM’s ranked indices and continues mechanical BFS for the remainder.
- When a click or page load triggers a console error, network failure, dead control, or missing assertion, the explorer computes a dedup key (normalized message plus category plus route template), POSTs a
Finding, and if severity is blocker or major marks that branch for scenario emission. - At finalize, the walker traces back through
parent_state_idfor each blocker or major finding, synthesizes aScenarioSteplist (open_url→ clicks →assert_no_errorwith extracted error phrases →assert_text), and POSTs to/api/tests/app-scenarios. Withmining_verification='polling', each proposed scenario creates a checkpoint, pauses until the harness submits a verdict, then persists or drops the scenario. - The operator calls
get_app_walk(walk_id)for the final record, or pollslist_app_walkswhile it runs. - In the UI, the Walks tab polls
listAppWalksevery 10 s, renders the walk list with status badges and severity-color-coded finding counts, and a row click opens the detail dialog: Mermaid state graph, findings table, emitted scenarios, AI decisions whenai_stateis non-null, and the server-renderedreport.htmlembedded at the bottom.
Example prompts
Walk https://walker-canary.apps.openfactory.tech on my tester VM in hybrid
mode with goal "find broken buttons and unhandled errors". Cap at 25 states
and depth 4. When it's done, show me the blocker findings and the
scenarios it emitted.List my last 10 completed walks for app "checkout-staging" and tell me
which ones produced new blocker findings vs. only repeats.Stop walk 550e8400-... it's stuck looping the modal. Then enumerate the
DOM on the tester VM so I can see what selectors it thought were
clickable on the current page.Related
- App UI Testing — author and replay curated GUI scenarios. The walker emits these directly from findings.
- App Deployment — deploy a Git repo to a public preview URL the walker can crawl.
- MCP Integration — set up the OpenFactory MCP server.