Walker Baselines, Diffs, and Tickets — close the regression loop

After every code change, walk the app again and diff against your baseline to see what regressed, what got fixed, and what’s new. No manual tab-flipping between two walk reports, no jq over two JSON dumps, no eyeballing screenshot grids. Pin a known-green walk, re-walk after each edit, and export the new findings straight to Linear, Jira, or GitHub.

Just want to see the latest walk? Skip this page and read Autonomous Walker — that’s the upstream feature that produces the walks this page diffs and exports from. Baselines, diffs, and ticket export only make sense once you have at least one walk to pin.

Today (M1 is live, end-to-end real). Baselines persist under /TEST_STORAGE_DIR/app-walks/_baselines/ (cached diffs land in /TEST_STORAGE_DIR/app-walks/_diffs/), diffs are computed by joining states on content-addressed state_id and findings on dedup_key, and ticket export serializes to ready-to-POST JSON payloads for Linear’s GraphQL issueCreate, Jira’s REST /issue, and GitHub’s REST /repos/{owner}/{repo}/issues. M1 returns the payloads for operator copy-paste; M2 lands direct POST via stored webhook tokens.

How it works

Pin a baseline — after a clean walk (zero blocker/major findings on a deployed build you trust) call set_walk_baseline with the walk_id and an app_name. One baseline per (user_id, app_name) pair; setting a new one replaces the old.
Re-walk after each change — run walk_app against the same URL on the same tester VM. You get a fresh walk_id with its own state graph and findings.
Diff the two walks — diff_walks(base_walk_id=..., new_walk_id=...) joins states by state_id (a content hash over route + DOM signature) and findings by dedup_key, returning summary counts and per-row deltas. First call computes and caches; later calls hit the cache.
Triage in a tree — walk_recursive_report(walk_id, baseline_walk_id) renders the new walk as a BFS tree rooted at root_state_id, annotating each node NEW | UNCHANGED | REMOVED | CHANGED. The subtree where the regression actually lives lights up.
Export findings as tickets — walk_findings_to_tickets(walk_id, format='linear'|'jira'|'github', baseline_walk_id=..., only_new_since_baseline=true) returns one ticket payload per finding ready to POST to the target tracker. Filter by severity, dedupe against the baseline, drop the payload into your webhook.

State and finding identity

The diff machinery only works because identity is content-addressed:

state_id — the first 16 hex chars of sha256(route_template | dom_signature), where dom_signature is itself a 16-hex hash over visible-and-enabled interactive-element fingerprints. Two walks that hit the same route with the same DOM produce the same state_id; a real change (new button, removed link) changes the underlying dom_signature and therefore the state_id.
dedup_key — the first 24 hex chars of sha256(category | url | normalized_message[:80]), where normalized_message collapses volatile tokens (timestamps, UUIDs, line:col, cache-busters). Two walks that surface the same TypeError at /reports produce the same key, so persistent findings stay persistent across walks and only genuinely new regressions show up as status='new'.

State changes report a changed_fields list (title, content_phash, or interactive_elements_count) so you can see what shifted without re-rendering both walks.

MCP tools

`set_walk_baseline`

Pin a walk as the current baseline for an app. One baseline per (user_id, app_name); setting a new one replaces the old.


set_walk_baseline(walk_id='walk-20250626-xyz', app_name='demo-app',
                  notes='green after analytics refactor')
  → {
      "user_id": "user-123",
      "app_name": "demo-app",
      "walk_id":  "walk-20250626-xyz",
      "set_at":   "2026-06-26T14:32:11.123Z",
      "walk_started_at": "2026-06-26T14:15:00Z",
      "walk_app_url":    "https://app.example.com",
      "notes": "green after analytics refactor",
      "session_token": "user-123"
    }

`get_walk_baseline`

Fetch the current baseline walk pinned for an app. Returns {baseline: null} if nothing is pinned. Scoped to your own pins only.


get_walk_baseline(app_name='demo-app')
  → {
      "baseline": {
        "user_id": "user-123",
        "app_name": "demo-app",
        "walk_id":  "walk-20250626-xyz",
        "set_at":   "2026-06-26T14:32:11.123Z",
        "walk_started_at": "2026-06-26T14:15:00Z",
        "walk_app_url":    "https://app.example.com",
        "notes": "green after analytics refactor"
      },
      "session_token": "user-123"
    }

`list_walk_baselines`

List every baseline pin across all your apps. Useful for a dashboard or for sanity-checking which apps you’re tracking regressions on.


list_walk_baselines()
  → {
      "baselines": [
        {"user_id": "user-123", "app_name": "demo-app",
         "walk_id": "walk-20250626-xyz",
         "set_at":  "2026-06-26T14:32:11.123Z"},
        {"user_id": "user-123", "app_name": "analytics-app",
         "walk_id": "walk-20250620-abc",
         "set_at":  "2026-06-20T09:15:00Z"}
      ],
      "session_token": "user-123"
    }

`clear_walk_baseline`

Clear the baseline pin for an app. Idempotent (returns {cleared: true} whether or not a baseline was set).


clear_walk_baseline(app_name='demo-app')
  → { "cleared": true, "session_token": "user-123" }

`diff_walks`

Compute the diff between two walks: added/removed/changed states plus new/resolved/persistent findings, with summary counts. First call computes; later calls return the cached record.


diff_walks(base_walk_id='walk-20250620-abc',
           new_walk_id='walk-20250626-xyz', use_cache=true)
  → {
      "base_walk_id": "walk-20250620-abc",
      "new_walk_id":  "walk-20250626-xyz",
      "computed_at":  "2026-06-26T14:35:22.456Z",
      "summary": {
        "states_added": 2, "states_removed": 0,
        "states_changed": 1, "states_unchanged": 5,
        "findings_new": 3, "findings_resolved": 1,
        "findings_persistent": 2
      },
      "state_changes": [
        {"state_id": "s-abc123", "route_template": "/reports/new",
         "status": "new", "changed_fields": [], "depth": 2},
        {"state_id": "s-def456", "route_template": "/settings",
         "status": "changed", "changed_fields": ["title"], "depth": 1}
      ],
      "finding_changes": [
        {"dedup_key": "f-new123", "severity": "blocker",
         "category": "console_error", "status": "new",
         "finding_id": "f-xyz789",
         "message":  "Uncaught TypeError: Cannot read...",
         "route_template": "/reports", "state_id": "s-abc123"}
      ],
      "session_token": "user-123"
    }

`walk_recursive_report`

Render a walk as a BFS tree rooted at root_state_id. When a baseline_walk_id is supplied, each node carries a status field (NEW | UNCHANGED | REMOVED | CHANGED) and a changed_fields list.


walk_recursive_report(walk_id='walk-20250626-xyz',
                      baseline_walk_id='walk-20250620-abc')
  → {
      "walk_id":          "walk-20250626-xyz",
      "baseline_walk_id": "walk-20250620-abc",
      "root_state_id":    "s-root",
      "total_states":     8,
      "tree": {
        "state_id": "s-root", "route_template": "/",
        "url":      "https://app.example.com",
        "title":    "Dashboard",
        "depth": 0, "parent_state_id": null,
        "status": "UNCHANGED", "changed_fields": [],
        "finding_count_by_severity": {"major": 1},
        "finding_ids": ["f-root-001"],
        "children": [
          {"state_id": "s-settings", "route_template": "/settings",
           "status": "CHANGED", "changed_fields": ["title"]}
        ]
      },
      "orphans": [],
      "session_token": "user-123"
    }

`walk_findings_to_tickets`

Serialize walk findings into Linear, Jira, or GitHub ticket payloads. M1 returns JSON ready for operator copy-paste; M2 will POST directly using stored webhook tokens. Filter by severity_filter and combine with baseline_walk_id + only_new_since_baseline=true to export only regressions.


walk_findings_to_tickets(walk_id='walk-20250626-xyz', format='linear',
                         severity_filter='major',
                         baseline_walk_id='walk-20250620-abc',
                         only_new_since_baseline=true,
                         team_id='linear-team-uuid')
  → {
      "walk_id": "walk-20250626-xyz",
      "format":  "linear",
      "count":   2,
      "skipped": 1,
      "options_used": {"team_id": "linear-team-uuid"},
      "tickets": [{
        "finding_id": "f-xyz789",
        "dedup_key":  "f-new123",
        "severity":   "blocker",
        "category":   "console_error",
        "ticket": {
          "title": "[BLOCKER] console_error at /reports: Uncaught TypeError",
          "body_markdown": "**Severity:** blocker   **Category:** `console_error` ...",
          "labels": ["walker", "severity:blocker",
                     "category:console_error", "app:demo-app"]
        },
        "payload": {
          "teamId": "linear-team-uuid",
          "title":  "[BLOCKER] console_error at /reports: Uncaught TypeError",
          "description": "**Severity:** blocker   **Category:** `console_error`...",
          "priority": 1, "labelIds": [],
          "projectId": null, "assigneeId": null
        }
      }],
      "next": "Each entry in tickets[] has a 'payload' dict ready to POST.",
      "session_token": "user-123"
    }

`run_all_app_scenarios`

Kick off a batch run of every saved scenario for an app. Thin wrapper over run_app_test_group with scope='app'. Use this right after a green walk to confirm no scripted scenario regressed either.


run_all_app_scenarios(app_name='demo-app', vm_name=null)
  → {
      "batch_run_id": "batch-20250626-123",
      "app_name":     "demo-app",
      "total":  12,
      "status": "queued",
      "dashboard_url": "https://cto-gui.example.com/app/demo-app/batch/batch-20250626-123",
      "session_token": "user-123"
    }

Unified Ticket model and per-tracker payloads

Internally every finding becomes the same Ticket shape:


Ticket = {
  finding_id, walk_id, severity, category,
  title:          "[<SEV>] <category> at <route>: <truncated message>",
  body_markdown:  "<severity> + <category> + <walk_id> + <app_name>
                   ### Message\n```\n<full message>\n```
                   ### Evidence\n- URL: <state url>\n- Screenshot: <api path>",
  labels: ["walker", "severity:<sev>", "category:<cat>", "app:<name>"],
  options: <format-specific>
}

The serializer then projects that Ticket into the wire shape the target tracker expects:

Linear (GraphQL issueCreate): payload is the IssueCreateInput — {teamId, title, description, priority, labelIds, projectId, assigneeId}. Post with mutation IssueCreate($input: IssueCreateInput!) { issueCreate(input: $input) { issue { id url } } }.
Jira (REST /rest/api/3/issue): payload is the request body — {fields: {project: {key}, summary, description, issuetype: {name}, priority: {name}, labels}}. Post to your Jira base.
GitHub (REST /repos/{owner}/{repo}/issues): payload is {title, body, labels, assignees}. Post with a PAT or app token.

The next field in the response tells you exactly which endpoint and method to use for the format you asked for.

REST surface

The MCP tools are thin wrappers over an authenticated REST API. All routes require Authorization: Bearer <token>. Callers see only their own baselines and walks unless they are admin.

REST responses are the raw service shapes (see the MCP tool examples above for the canonical fields). The MCP layer additionally adds session_token and a next hint to every response; the REST endpoints return neither.

Method	Path	Purpose
`POST`	`/api/tests/app-walks/{walk_id}/set-baseline`	Pin a walk as the baseline for an app; optional `notes`
`GET`	`/api/tests/app-walks/baselines`	List every baseline pin for the current user
`GET`	`/api/tests/app-walks/baseline?app_name=X`	Fetch the baseline for one app (returns `{baseline: null}` if unset)
`DELETE`	`/api/tests/app-walks/baseline?app_name=X`	Clear the baseline for an app (idempotent)
`GET`	`/api/tests/app-walks/diff?base=X&new=Y&use_cache=true`	Compute or retrieve cached diff between two walks
`GET`	`/api/tests/app-walks/{walk_id}/recursive-report?baseline_walk_id=Z`	Render walk as BFS tree, optionally annotated with diff status
`GET`	`/api/tests/app-walks/{walk_id}/tickets?format=linear&severity_filter=major&baseline_walk_id=Z&only_new_since_baseline=true&team_id=UUID`	Serialize findings to Linear, Jira, or GitHub ticket payloads

Putting it together

Baseline pin. After a successful deploy and clean walk (zero blocker/major findings), operator calls set_walk_baseline(walk_id='walk-abc123', app_name='demo-app', notes='post-v1.2 release') to mark it as the golden reference.
Edit and re-walk. Developer modifies an analytics component and re-runs walk_app, generating a new walk walk-def456.
Regression detection. Operator calls diff_walks(base_walk_id='walk-abc123', new_walk_id='walk-def456'), which joins states by content-addressed state_id and findings by dedup_key, returning {summary: {states_added: 0, states_changed: 1, findings_new: 2, findings_persistent: 1}}. The two new findings are the signal that the edit introduced issues.
Tree-view triage. Operator calls walk_recursive_report(walk_id='walk-def456', baseline_walk_id='walk-abc123') to see a hierarchical view with the /analytics subtree marked CHANGED and the two new findings pinpointed to the state where DOM content diverged.
One-click export. Operator calls walk_findings_to_tickets(walk_id='walk-def456', format='linear', severity_filter='major', baseline_walk_id='walk-abc123', only_new_since_baseline=true, team_id='uuid-abc'), receiving JSON payloads for the 2 major-or-higher findings. They copy the payload array and POST to Linear’s GraphQL endpoint via the issueCreate mutation.
Batch regression. Operator calls run_all_app_scenarios(app_name='demo-app') to exercise every saved scenario against the newly deployed build, queuing a batch_run_id pollable via get_app_test_group(batch_run_id).

Example prompts


Pin walk 550e8400-... as the baseline for app "demo-app" with note
"green after the analytics refactor". Then list every baseline I have
pinned.


Diff my last walk against the demo-app baseline and show me only the
new findings (not the persistent ones). For each new blocker, give me
the route and the parent state so I know where in the BFS tree it
showed up.


Export every major-or-blocker finding from walk-def456 that's new
since the demo-app baseline as Linear tickets for team
linear-team-uuid. Give me the payload array so I can POST them to
the issueCreate mutation.

Autonomous Walker — upstream feature that emits the walks, findings, and scenarios this page diffs and exports.
App UI Testing — replay the curated scenarios run_all_app_scenarios batch-runs.
App Deployment — deploy a Git repo to a public preview URL the walker can crawl; M3 will auto-pin a baseline on green deploys.
MCP Integration — set up the OpenFactory MCP server.

Walker Baselines, Diffs, and Tickets — close the regression loop

How it works

State and finding identity

MCP tools

set_walk_baseline

get_walk_baseline

list_walk_baselines

clear_walk_baseline

diff_walks

walk_recursive_report

walk_findings_to_tickets

run_all_app_scenarios