Walker Baselines, Diffs, and Tickets — close the regression loop
After every code change, walk the app again and diff against your
baseline to see what regressed, what got fixed, and what’s new. No
manual tab-flipping between two walk reports, no jq over two JSON
dumps, no eyeballing screenshot grids. Pin a known-green walk, re-walk
after each edit, and export the new findings straight to Linear, Jira,
or GitHub.
Just want to see the latest walk? Skip this page and read Autonomous Walker — that’s the upstream feature that produces the walks this page diffs and exports from. Baselines, diffs, and ticket export only make sense once you have at least one walk to pin.
Today (M1 is live, end-to-end real). Baselines persist under
/TEST_STORAGE_DIR/app-walks/_baselines/ (cached diffs land in
/TEST_STORAGE_DIR/app-walks/_diffs/), diffs are computed by joining
states on content-addressed state_id and findings on dedup_key, and
ticket export serializes to ready-to-POST JSON payloads for Linear’s
GraphQL issueCreate, Jira’s REST /issue, and GitHub’s REST
/repos/{owner}/{repo}/issues. M1 returns the payloads for operator
copy-paste; M2 lands direct POST via stored webhook tokens.
How it works
- Pin a baseline — after a clean walk (zero blocker/major
findings on a deployed build you trust) call
set_walk_baselinewith thewalk_idand anapp_name. One baseline per(user_id, app_name)pair; setting a new one replaces the old. - Re-walk after each change — run
walk_appagainst the same URL on the same tester VM. You get a freshwalk_idwith its own state graph and findings. - Diff the two walks —
diff_walks(base_walk_id=..., new_walk_id=...)joins states bystate_id(a content hash over route + DOM signature) and findings bydedup_key, returning summary counts and per-row deltas. First call computes and caches; later calls hit the cache. - Triage in a tree —
walk_recursive_report(walk_id, baseline_walk_id)renders the new walk as a BFS tree rooted atroot_state_id, annotating each nodeNEW | UNCHANGED | REMOVED | CHANGED. The subtree where the regression actually lives lights up. - Export findings as tickets —
walk_findings_to_tickets(walk_id, format='linear'|'jira'|'github', baseline_walk_id=..., only_new_since_baseline=true)returns one ticket payload per finding ready to POST to the target tracker. Filter by severity, dedupe against the baseline, drop the payload into your webhook.
State and finding identity
The diff machinery only works because identity is content-addressed:
state_id— the first 16 hex chars ofsha256(route_template | dom_signature), wheredom_signatureis itself a 16-hex hash over visible-and-enabled interactive-element fingerprints. Two walks that hit the same route with the same DOM produce the samestate_id; a real change (new button, removed link) changes the underlyingdom_signatureand therefore thestate_id.dedup_key— the first 24 hex chars ofsha256(category | url | normalized_message[:80]), wherenormalized_messagecollapses volatile tokens (timestamps, UUIDs, line:col, cache-busters). Two walks that surface the sameTypeErrorat/reportsproduce the same key, so persistent findings stay persistent across walks and only genuinely new regressions show up asstatus='new'.
State changes report a changed_fields list (title, content_phash,
or interactive_elements_count) so you can see what shifted without
re-rendering both walks.
MCP tools
set_walk_baseline
Pin a walk as the current baseline for an app. One baseline per
(user_id, app_name); setting a new one replaces the old.
set_walk_baseline(walk_id='walk-20250626-xyz', app_name='demo-app',
notes='green after analytics refactor')
→ {
"user_id": "user-123",
"app_name": "demo-app",
"walk_id": "walk-20250626-xyz",
"set_at": "2026-06-26T14:32:11.123Z",
"walk_started_at": "2026-06-26T14:15:00Z",
"walk_app_url": "https://app.example.com",
"notes": "green after analytics refactor",
"session_token": "user-123"
}get_walk_baseline
Fetch the current baseline walk pinned for an app. Returns
{baseline: null} if nothing is pinned. Scoped to your own pins only.
get_walk_baseline(app_name='demo-app')
→ {
"baseline": {
"user_id": "user-123",
"app_name": "demo-app",
"walk_id": "walk-20250626-xyz",
"set_at": "2026-06-26T14:32:11.123Z",
"walk_started_at": "2026-06-26T14:15:00Z",
"walk_app_url": "https://app.example.com",
"notes": "green after analytics refactor"
},
"session_token": "user-123"
}list_walk_baselines
List every baseline pin across all your apps. Useful for a dashboard or for sanity-checking which apps you’re tracking regressions on.
list_walk_baselines()
→ {
"baselines": [
{"user_id": "user-123", "app_name": "demo-app",
"walk_id": "walk-20250626-xyz",
"set_at": "2026-06-26T14:32:11.123Z"},
{"user_id": "user-123", "app_name": "analytics-app",
"walk_id": "walk-20250620-abc",
"set_at": "2026-06-20T09:15:00Z"}
],
"session_token": "user-123"
}clear_walk_baseline
Clear the baseline pin for an app. Idempotent (returns
{cleared: true} whether or not a baseline was set).
clear_walk_baseline(app_name='demo-app')
→ { "cleared": true, "session_token": "user-123" }diff_walks
Compute the diff between two walks: added/removed/changed states plus new/resolved/persistent findings, with summary counts. First call computes; later calls return the cached record.
diff_walks(base_walk_id='walk-20250620-abc',
new_walk_id='walk-20250626-xyz', use_cache=true)
→ {
"base_walk_id": "walk-20250620-abc",
"new_walk_id": "walk-20250626-xyz",
"computed_at": "2026-06-26T14:35:22.456Z",
"summary": {
"states_added": 2, "states_removed": 0,
"states_changed": 1, "states_unchanged": 5,
"findings_new": 3, "findings_resolved": 1,
"findings_persistent": 2
},
"state_changes": [
{"state_id": "s-abc123", "route_template": "/reports/new",
"status": "new", "changed_fields": [], "depth": 2},
{"state_id": "s-def456", "route_template": "/settings",
"status": "changed", "changed_fields": ["title"], "depth": 1}
],
"finding_changes": [
{"dedup_key": "f-new123", "severity": "blocker",
"category": "console_error", "status": "new",
"finding_id": "f-xyz789",
"message": "Uncaught TypeError: Cannot read...",
"route_template": "/reports", "state_id": "s-abc123"}
],
"session_token": "user-123"
}walk_recursive_report
Render a walk as a BFS tree rooted at root_state_id. When a
baseline_walk_id is supplied, each node carries a status field
(NEW | UNCHANGED | REMOVED | CHANGED) and a changed_fields list.
walk_recursive_report(walk_id='walk-20250626-xyz',
baseline_walk_id='walk-20250620-abc')
→ {
"walk_id": "walk-20250626-xyz",
"baseline_walk_id": "walk-20250620-abc",
"root_state_id": "s-root",
"total_states": 8,
"tree": {
"state_id": "s-root", "route_template": "/",
"url": "https://app.example.com",
"title": "Dashboard",
"depth": 0, "parent_state_id": null,
"status": "UNCHANGED", "changed_fields": [],
"finding_count_by_severity": {"major": 1},
"finding_ids": ["f-root-001"],
"children": [
{"state_id": "s-settings", "route_template": "/settings",
"status": "CHANGED", "changed_fields": ["title"]}
]
},
"orphans": [],
"session_token": "user-123"
}walk_findings_to_tickets
Serialize walk findings into Linear, Jira, or GitHub ticket payloads.
M1 returns JSON ready for operator copy-paste; M2 will POST directly
using stored webhook tokens. Filter by severity_filter and combine
with baseline_walk_id + only_new_since_baseline=true to export
only regressions.
walk_findings_to_tickets(walk_id='walk-20250626-xyz', format='linear',
severity_filter='major',
baseline_walk_id='walk-20250620-abc',
only_new_since_baseline=true,
team_id='linear-team-uuid')
→ {
"walk_id": "walk-20250626-xyz",
"format": "linear",
"count": 2,
"skipped": 1,
"options_used": {"team_id": "linear-team-uuid"},
"tickets": [{
"finding_id": "f-xyz789",
"dedup_key": "f-new123",
"severity": "blocker",
"category": "console_error",
"ticket": {
"title": "[BLOCKER] console_error at /reports: Uncaught TypeError",
"body_markdown": "**Severity:** blocker **Category:** `console_error` ...",
"labels": ["walker", "severity:blocker",
"category:console_error", "app:demo-app"]
},
"payload": {
"teamId": "linear-team-uuid",
"title": "[BLOCKER] console_error at /reports: Uncaught TypeError",
"description": "**Severity:** blocker **Category:** `console_error`...",
"priority": 1, "labelIds": [],
"projectId": null, "assigneeId": null
}
}],
"next": "Each entry in tickets[] has a 'payload' dict ready to POST.",
"session_token": "user-123"
}run_all_app_scenarios
Kick off a batch run of every saved scenario for an app. Thin wrapper
over run_app_test_group with scope='app'. Use this right after a
green walk to confirm no scripted scenario regressed either.
run_all_app_scenarios(app_name='demo-app', vm_name=null)
→ {
"batch_run_id": "batch-20250626-123",
"app_name": "demo-app",
"total": 12,
"status": "queued",
"dashboard_url": "https://cto-gui.example.com/app/demo-app/batch/batch-20250626-123",
"session_token": "user-123"
}Unified Ticket model and per-tracker payloads
Internally every finding becomes the same Ticket shape:
Ticket = {
finding_id, walk_id, severity, category,
title: "[<SEV>] <category> at <route>: <truncated message>",
body_markdown: "<severity> + <category> + <walk_id> + <app_name>
### Message\n```\n<full message>\n```
### Evidence\n- URL: <state url>\n- Screenshot: <api path>",
labels: ["walker", "severity:<sev>", "category:<cat>", "app:<name>"],
options: <format-specific>
}The serializer then projects that Ticket into the wire shape the target tracker expects:
- Linear (GraphQL
issueCreate): payload is theIssueCreateInput—{teamId, title, description, priority, labelIds, projectId, assigneeId}. Post withmutation IssueCreate($input: IssueCreateInput!) { issueCreate(input: $input) { issue { id url } } }. - Jira (REST
/rest/api/3/issue): payload is the request body —{fields: {project: {key}, summary, description, issuetype: {name}, priority: {name}, labels}}. Post to your Jira base. - GitHub (REST
/repos/{owner}/{repo}/issues): payload is{title, body, labels, assignees}. Post with a PAT or app token.
The next field in the response tells you exactly which endpoint and
method to use for the format you asked for.
REST surface
The MCP tools are thin wrappers over an authenticated REST API. All
routes require Authorization: Bearer <token>. Callers see only their
own baselines and walks unless they are admin.
REST responses are the raw service shapes (see the MCP tool examples
above for the canonical fields). The MCP layer additionally adds
session_token and a next hint to every response; the REST endpoints
return neither.
| Method | Path | Purpose |
|---|---|---|
POST | /api/tests/app-walks/{walk_id}/set-baseline | Pin a walk as the baseline for an app; optional notes |
GET | /api/tests/app-walks/baselines | List every baseline pin for the current user |
GET | /api/tests/app-walks/baseline?app_name=X | Fetch the baseline for one app (returns {baseline: null} if unset) |
DELETE | /api/tests/app-walks/baseline?app_name=X | Clear the baseline for an app (idempotent) |
GET | /api/tests/app-walks/diff?base=X&new=Y&use_cache=true | Compute or retrieve cached diff between two walks |
GET | /api/tests/app-walks/{walk_id}/recursive-report?baseline_walk_id=Z | Render walk as BFS tree, optionally annotated with diff status |
GET | /api/tests/app-walks/{walk_id}/tickets?format=linear&severity_filter=major&baseline_walk_id=Z&only_new_since_baseline=true&team_id=UUID | Serialize findings to Linear, Jira, or GitHub ticket payloads |
Putting it together
- Baseline pin. After a successful deploy and clean walk (zero
blocker/major findings), operator calls
set_walk_baseline(walk_id='walk-abc123', app_name='demo-app', notes='post-v1.2 release')to mark it as the golden reference. - Edit and re-walk. Developer modifies an analytics component
and re-runs
walk_app, generating a new walkwalk-def456. - Regression detection. Operator calls
diff_walks(base_walk_id='walk-abc123', new_walk_id='walk-def456'), which joins states by content-addressedstate_idand findings bydedup_key, returning{summary: {states_added: 0, states_changed: 1, findings_new: 2, findings_persistent: 1}}. The two new findings are the signal that the edit introduced issues. - Tree-view triage. Operator calls
walk_recursive_report(walk_id='walk-def456', baseline_walk_id='walk-abc123')to see a hierarchical view with the/analyticssubtree markedCHANGEDand the two new findings pinpointed to the state where DOM content diverged. - One-click export. Operator calls
walk_findings_to_tickets(walk_id='walk-def456', format='linear', severity_filter='major', baseline_walk_id='walk-abc123', only_new_since_baseline=true, team_id='uuid-abc'), receiving JSON payloads for the 2 major-or-higher findings. They copy thepayloadarray and POST to Linear’s GraphQL endpoint via theissueCreatemutation. - Batch regression. Operator calls
run_all_app_scenarios(app_name='demo-app')to exercise every saved scenario against the newly deployed build, queuing abatch_run_idpollable viaget_app_test_group(batch_run_id).
Example prompts
Pin walk 550e8400-... as the baseline for app "demo-app" with note
"green after the analytics refactor". Then list every baseline I have
pinned.Diff my last walk against the demo-app baseline and show me only the
new findings (not the persistent ones). For each new blocker, give me
the route and the parent state so I know where in the BFS tree it
showed up.Export every major-or-blocker finding from walk-def456 that's new
since the demo-app baseline as Linear tickets for team
linear-team-uuid. Give me the payload array so I can POST them to
the issueCreate mutation.Related
- Autonomous Walker — upstream feature that emits the walks, findings, and scenarios this page diffs and exports.
- App UI Testing — replay the curated
scenarios
run_all_app_scenariosbatch-runs. - App Deployment — deploy a Git repo to a public preview URL the walker can crawl; M3 will auto-pin a baseline on green deploys.
- MCP Integration — set up the OpenFactory MCP server.