Autonomous App Walker — find bugs the prompt never thought to test

Point the walker at any app URL on a tester VM and it explores the UI for you: breadth-first across reachable routes, clicking real DOM elements via the Chromium DevTools Protocol, recording console errors and failed network requests, and emitting runnable scenarios for every blocker or major finding it traps. Use it on a freshly vibecoded app, a staging build, or a production URL to surface the failure modes a hand-written test plan would miss.

What this is for. Catching the bugs nobody scripted a test for — dead buttons, console errors on a route the prompt never named, broken redirects, a TypeError two clicks deep. For a curated login or checkout flow you already know about, write a scenario instead.

Today (M1 through M5 are live, end-to-end real). DevTools primitives talk to live CDP at :9222 on tester VMs, the BFS explorer drives the browser via the VM Interaction API, walks and findings persist under /TEST_STORAGE_DIR/app-walks/, the REST surface is authenticated, and the Walks tab in the Apps UI renders against real walk records. There is no stub mode — every walk you start runs against a real browser.

How it works

Pick a tester VM and a target URL — ensure_tester_vm gives you a persistent desktop VM with Chromium running under CDP. Any reachable URL works (deployed app, preview, local dev server).
Start a walk — walk_app returns a walk_id and a dashboard_url immediately. The explorer runs in the background.
The explorer crawls the app — at each state it snapshots the page, enumerates real clickable elements via CDP, collects browser diagnostics, filters destructive actions, and pushes unvisited children into the BFS frontier (bounded by max_states and max_depth).
Findings get deduped and ranked — console errors, failed requests, dead controls, and missing assertions are persisted with a sha256 dedup key over (category, route_template, selector, message[:80]) and tagged blocker | major | minor | info.
Scenarios get emitted — for each blocker/major finding the walker traces back through parent_state_id and synthesizes a ScenarioStep sequence (open_url → clicks → assert_no_error → assert_text) and POSTs it to /api/tests/app-scenarios, ready to replay.
Review the report — get_app_walk returns the full record; the server also renders a standalone report.html with a Mermaid state graph, severity-grouped findings, and the emitted scenarios.

Walker modes

mechanical — pure BFS over enumerated DOM. Deterministic, exhaustive within bounds, no LLM cost.
ai — at each state-seed the LLM ranks candidate actions against ai_goal (“Find broken buttons and flows”); the explorer follows the ranked indices first.
hybrid — LLM ranks the top candidates per state, mechanical BFS sweeps the remainder. Default for app-shaped targets.

Destructive actions

By default the walker refuses to click anything that looks destructive (Delete, Remove, Clear, Unsubscribe, Pay, etc). Opt back in with destructive_allowed=True and a destructive_label_allowlist of literal labels or re: regex patterns. Anything not on the allowlist stays blocked even when destructive actions are otherwise permitted.

MCP tools

`walk_app`

Start an autonomous walk over a target URL on a tester VM with configurable BFS bounds, mode, and destructive-action filters.


walk_app(app_name='walker-canary', app_url='https://example.com/app',
         vm_name='of-tester-xyz', max_states=20, max_depth=4,
         mode='hybrid', ai_goal='Find broken buttons and flows',
         destructive_allowed=True,
         destructive_label_allowlist=['Remove from cart', 're:^Clear '])
  → {
      "walk_id": "550e8400-e29b-41d4-a716-446655440000",
      "status": "running",
      "vm_name": "of-tester-xyz",
      "app_url": "https://example.com/app",
      "config": {"mode": "hybrid", "max_states": 20, "max_depth": 4,
                 "destructive_allowed": true},
      "dashboard_url": "http://libvirt-backend/api/tests/app-walks/550e8400-...",
      "report_url":    "http://libvirt-backend/api/tests/app-walks/550e8400-.../report.html",
      "started_at": "2026-06-26T10:15:23.456Z"
    }

`get_app_walk`

Fetch the full walk record: status, state graph, transitions, findings with evidence, and emitted scenario IDs.


get_app_walk(walk_id='550e8400-e29b-41d4-a716-446655440000',
             include_states=True, include_transitions=True, include_findings=True)
  → {
      "walk_id": "550e8400-...",
      "status":  "completed",
      "app_name": "walker-canary",
      "totals": {"states": 8, "transitions": 12, "scenarios_emitted": 3,
                 "findings_by_severity": {"blocker": 1, "major": 2,
                                          "minor": 5, "info": 3}},
      "states":   [{"state_id": "a1b2c3d4...", "url": "...", "title": "Dashboard"}],
      "findings": [{"finding_id": "f123", "category": "console_error",
                    "severity": "blocker",
                    "message": "TypeError: Cannot read property 'map'",
                    "state_id": "a1b2c3d4..."}]
    }

`list_app_walks`

List recent walks filtered by app name, project, or status (running, completed, stopped, error).


list_app_walks(app_name='walker-canary', status='completed', limit=20)
  → {
      "walks": [{
        "walk_id": "550e8400-...", "app_name": "walker-canary",
        "status":  "completed",   "started_at": "2026-06-26T10:15:23Z",
        "totals": {"states": 8, "transitions": 12, "scenarios_emitted": 3,
                   "findings_by_severity": {"blocker": 1, "major": 2,
                                            "minor": 5, "info": 3}}
      }],
      "count": 1
    }

`stop_app_walk`

Request cooperative cancellation of a running walk; the explorer terminates within one poll cycle.


stop_app_walk(walk_id='550e8400-e29b-41d4-a716-446655440000')
  → { "walk_id": "550e8400-...", "status": "stopped",
      "cancel_requested": true, "finished_at": "2026-06-26T10:22:15Z" }

`enumerate_dom_links`

List every clickable or typable DOM element on the active browser tab via CDP, with selector, accessible name, href, bbox, and state flags. Useful for debugging what the walker actually sees on a given page.


enumerate_dom_links(vm_name='of-tester-xyz',
                    tab_filter='https://example.com',
                    include_perf=True)
  → [
      {"type": "text", "elements": [
         {"idx": 0, "selector": "button.submit", "tag": "button",
          "role": "button", "accessible_name": "Submit",
          "bbox": [100, 200, 200, 230],
          "in_viewport": true, "visible": true, "disabled": false}
      ]},
      {"type": "text", "text": "1 interactive element",
       "perf": {"screenshot_ms": 120, "cdp_enumerate_ms": 180}}
    ]

`get_browser_diagnostics`

Capture browser console errors, failed network requests, page violations, and performance metrics via CDP. Snapshot mode by default; pass collect_duration_ms for live collection.


get_browser_diagnostics(vm_name='of-tester-xyz',
                        collect_duration_ms=1500,
                        tab_filter='https://example.com')
  → [
      {"type": "text", "text": "Browser diagnostics for of-tester-xyz"},
      {"type": "text", "text": "Snapshot: tab=https://example.com, collected 0ms"},
      {"type": "text", "text": "Console errors: 2\n  - TypeError: Cannot read property 'map' of undefined (at app.js:145:12)\n  - ReferenceError: globalThis is not defined"},
      {"type": "text", "text": "Failed requests: 1\n  - GET /api/user/profile (404 Not Found)"}
    ]

`get_pending_checkpoint`

Long-poll (0 to 25 s) for the oldest unresolved harness checkpoint visible to the caller. Used by external harnesses to gate mining-time scenario verification or replay-time step verification.


get_pending_checkpoint(walk_id='550e8400-e29b-41d4-a716-446655440000',
                       wait_seconds=25)
  → {
      "checkpoint": {
        "checkpoint_id": "cp-abc123def456",
        "walk_id": "550e8400-...",
        "kind":   "mine",
        "status": "pending",
        "prompt": "Verify proposed scenario captures the bug",
        "context": {
          "scenario_steps": [
            {"action_type": "open_url", "value": "https://example.com"},
            {"action_type": "click",    "target_label": "Submit"}
          ],
          "finding": {"finding_id": "f123", "category": "console_error",
                      "severity": "blocker",
                      "message": "TypeError: Cannot read property 'map'"}
        },
        "screenshot_url": "http://libvirt-backend/api/tests/app-walks/550e8400-.../states/a1b2c3d4.../screenshot"
      },
      "pending_count": 1
    }

`submit_scenario_checkpoint`

Submit a harness verdict (pass, fail, investigate, skip, timeout) on a paused mining or replay checkpoint to resume the walker or runner.


submit_scenario_checkpoint(checkpoint_id='cp-abc123def456', verdict='pass',
                           notes='Scenario correctly reproduced the TypeError')
  → { "checkpoint_id": "cp-abc123def456", "status": "resolved",
      "verdict": "pass",
      "verdict_notes": "Scenario correctly reproduced the TypeError",
      "resolved_at": "2026-06-26T10:25:10Z" }

`get_checkpoint`

Fetch a single checkpoint record (pending or resolved) by ID to read its verdict and audit trail.


get_checkpoint(checkpoint_id='cp-abc123def456')
  → { "checkpoint_id": "cp-abc123def456",
      "walk_id": "550e8400-...", "kind": "mine",
      "status": "resolved", "verdict": "pass",
      "created_at":  "2026-06-26T10:20:00Z",
      "resolved_at": "2026-06-26T10:25:10Z" }

REST surface

The MCP tools are thin wrappers over an authenticated REST API. All routes require Authorization: Bearer <token> plus X-User-Id (or X-User-Email for loopback). Callers see only their own walks unless they are admin.

Method	Path	Purpose
`POST`	`/api/tests/app-walks`	Create a new walk record
`GET`	`/api/tests/app-walks`	List walks (`app_name`, `project_id`, `status`, `limit`)
`GET`	`/api/tests/app-walks/{walk_id}`	Fetch full record (`include_states`, `include_transitions`, `include_findings`)
`PATCH`	`/api/tests/app-walks/{walk_id}`	Update status, totals, error, `ai_state`, emitted scenario IDs
`POST`	`/api/tests/app-walks/{walk_id}/stop`	Request cooperative cancellation
`POST`	`/api/tests/app-walks/{walk_id}/login`	Run an ephemeral login scenario as auth preamble
`POST`	`/api/tests/app-walks/{walk_id}/states`	Append an `ExplorationState` (BFS node)
`POST`	`/api/tests/app-walks/{walk_id}/transitions`	Append an `ExplorationTransition` (BFS edge)
`POST`	`/api/tests/app-walks/{walk_id}/findings`	Append a `Finding` (deduped by sha256)
`POST`	`/api/tests/app-walks/{walk_id}/screenshots/{state_id}`	Upload PNG for a state
`GET`	`/api/tests/app-walks/{walk_id}/states/{state_id}/screenshot`	Retrieve PNG for a state
`GET`	`/api/tests/app-walks/{walk_id}/report.html`	Server-rendered standalone HTML report
`POST`	`/api/tests/checkpoints`	Walker/runner creates a mining (`kind=mine`) or replay (`kind=verify`) checkpoint
`GET`	`/api/tests/checkpoints`	Long-poll oldest pending checkpoint scoped to `walk_id` or `run_id`
`GET`	`/api/tests/checkpoints/{checkpoint_id}`	Fetch a single checkpoint record
`POST`	`/api/tests/checkpoints/{checkpoint_id}/verdict`	Harness submits verdict, sets `status=resolved`, notifies waiter

Putting it together

Operator calls walk_app(app_name='myapp', app_url='https://myapp.dev', vm_name='of-tester-1', mode='hybrid', ai_goal='Find broken buttons'). MCP returns walk_id, status='running', and dashboard_url immediately.
The background explorer opens the URL in the VM’s browser, snapshots the page, calls enumerate_dom_links via CDP to read the true clickable elements, filters out destructive actions, and pushes the root state into the BFS frontier.
For each queued state the explorer calls get_browser_diagnostics to collect any errors or failed requests, takes a screenshot, persists the state via POST /api/tests/app-walks/{walk_id}/states, and POSTs every click action as a transition.
In hybrid or AI mode, on state-seeds the explorer ranks candidates by calling the LLM with the candidate brief plus the current screenshot, then prioritizes the LLM’s ranked indices and continues mechanical BFS for the remainder.
When a click or page load triggers a console error, network failure, dead control, or missing assertion, the explorer computes a dedup key (normalized message plus category plus route template), POSTs a Finding, and if severity is blocker or major marks that branch for scenario emission.
At finalize, the walker traces back through parent_state_id for each blocker or major finding, synthesizes a ScenarioStep list (open_url → clicks → assert_no_error with extracted error phrases → assert_text), and POSTs to /api/tests/app-scenarios. With mining_verification='polling', each proposed scenario creates a checkpoint, pauses until the harness submits a verdict, then persists or drops the scenario.
The operator calls get_app_walk(walk_id) for the final record, or polls list_app_walks while it runs.
In the UI, the Walks tab polls listAppWalks every 10 s, renders the walk list with status badges and severity-color-coded finding counts, and a row click opens the detail dialog: Mermaid state graph, findings table, emitted scenarios, AI decisions when ai_state is non-null, and the server-rendered report.html embedded at the bottom.

Example prompts


Walk https://walker-canary.apps.openfactory.tech on my tester VM in hybrid
mode with goal "find broken buttons and unhandled errors". Cap at 25 states
and depth 4. When it's done, show me the blocker findings and the
scenarios it emitted.


List my last 10 completed walks for app "checkout-staging" and tell me
which ones produced new blocker findings vs. only repeats.


Stop walk 550e8400-...  it's stuck looping the modal. Then enumerate the
DOM on the tester VM so I can see what selectors it thought were
clickable on the current page.

App UI Testing — author and replay curated GUI scenarios. The walker emits these directly from findings.
App Deployment — deploy a Git repo to a public preview URL the walker can crawl.
MCP Integration — set up the OpenFactory MCP server.

Autonomous App Walker — find bugs the prompt never thought to test

How it works

Walker modes

Destructive actions

MCP tools

walk_app

get_app_walk

list_app_walks

stop_app_walk

enumerate_dom_links

get_browser_diagnostics

get_pending_checkpoint

submit_scenario_checkpoint

get_checkpoint