Skip to Content
TestingCheckpoints & Rollback — safe iteration for agents

Checkpoints & Rollback — safe iteration for agents

Snapshot and restore an app’s code SHA, VM disk, and database state together, then revert (or re-apply) a change in a single bi-directional call. Agents can take a risky swing at a refactor, a migration, or a dependency bump and know they can land back at a known-good state without losing the failed attempt for debugging.

Today (M1): the orchestration layer is live — CheckpointService, REST, MCP, state transitions, retention pruning, and the six-stage rollback pipeline all work end to end. The snapshot adapter is a stub that fabricates snapshot IDs without touching real libvirt disk snapshots or DB dumps. M1.5 swaps in the real app_snapshots.py implementation behind the same interface, so any scenarios you write against the MCP tools today keep working unchanged.

Why checkpoints

Agent-driven iteration produces a lot of failed attempts. Without checkpoints you have to either (a) trust the agent’s diff-and-revert logic, or (b) carry the cost of rebuilding VM and DB state by hand every time something breaks. Checkpoints make rollback atomic across all three layers:

  • Code — the git SHA the app was deployed from.
  • VM disk — the qcow2 state of the app’s tester VM at snapshot time.
  • Database — a logical dump of any managed database attached to the app.

And rollback is bi-directional: before reverting, the service captures a pre-rollback safety checkpoint of the current (failed) state, so the agent can roll forward into the failure again to keep debugging.

How it works

  1. Snapshot before the risky changecreate_checkpoint quiesces the app, records the deployed git SHA, snapshots the VM disk, dumps each attached DB, and returns a checkpoint_id.
  2. Let the agent iterate — modify code, run a migration, deploy, walk the app. If it works, keep going. If it doesn’t, you have a known-good state to revert to.
  3. Roll backrollback_app captures a safety checkpoint of the broken state first, then runs a six-stage pipeline: safety-checkpoint, stop-unit, vm-revert, db-restore, start-unit, health-check. The stages list comes back on the response so a GUI can render progress without polling.
  4. Pin what you want to keep — retention prunes to the last 10 checkpoints plus one daily for 7 days. pin_checkpoint exempts a checkpoint from pruning entirely.

The app’s slug, public URL, and gateway route are preserved across rollback, the same https://<slug>.apps.openfactory.tech URL keeps serving, just from the restored state.

MCP tools

ToolUse
create_checkpointCapture a manual checkpoint (code, VM, DB) before a risky iteration
list_checkpointsList checkpoints (newest first) and per-app storage / retention usage
rollback_appBi-directional rollback with auto safety checkpoint and staged progress
pin_checkpointExempt a checkpoint from retention pruning
delete_checkpointRemove a checkpoint (fails if pinned, unpin first)

create_checkpoint

Capture a manual checkpoint before a risky agent iteration. Optionally pin it so retention never prunes it.

create_checkpoint( app_id="abc-123-def", notes="before risky refactor", pinned=True ) → { "checkpoint_id": "cp-a1b2c3d4", "ref": "abc1234567890def", "trigger": "manual", "vm_snapshot_id": "stub-snap-a1b2c3d4", "db_snapshot_id": "stub-db-snap-e5f6g7h8", "quiesced": true, "pinned": true, "size_bytes": 0, "notes": "before risky refactor", "deploy_id": null, "created_at": "2026-06-26T14:32:15Z", "session_token": "user-abc123", "next": "rollback_app(app_id='abc-123-def', checkpoint_id='cp-a1b2c3d4') to restore this state. Bi-directional: rollback captures a pre-rollback safety cp so you can roll forward again." }

trigger is one of deploy, manual, agent, or pre-rollback. Useful when filtering history for “what did the agent do here”.

list_checkpoints

List every checkpoint for the app plus the current storage footprint and the retention policy in effect.

list_checkpoints(app_id="abc-123-def") → { "checkpoints": [ { "checkpoint_id": "cp-a1b2c3d4", "ref": "abc1234567890def", "trigger": "manual", "vm_snapshot_id": "stub-snap-a1b2c3d4", "db_snapshot_id": "stub-db-snap-e5f6g7h8", "quiesced": true, "pinned": true, "size_bytes": 0, "notes": null, "deploy_id": null, "created_at": "2026-06-26T14:32:15Z" }, { "checkpoint_id": "cp-x9y8z7w6", "ref": "def456", "trigger": "manual", "vm_snapshot_id": "stub-snap-x9y8z7w6", "db_snapshot_id": null, "quiesced": true, "pinned": false, "size_bytes": 0, "notes": null, "deploy_id": null, "created_at": "2026-06-25T10:00:00Z" } ], "storage_usage": { "checkpoint_count": 2, "pinned_count": 1, "total_bytes": 0, "keep_last": 10, "daily_days": 7 }, "session_token": "user-abc123" }

total_bytes is 0 under the stub adapter, real disk usage shows up in M1.5.

rollback_app

Bi-directional rollback. The service captures a pre-rollback safety checkpoint first (so you can roll forward into the failed state for debugging), then walks the six-stage pipeline.

rollback_app(app_id="abc-123-def", checkpoint_id="cp-a1b2c3d4") → { "rolled_back_to": { "checkpoint_id": "cp-a1b2c3d4", "ref": "abc1234567890def", "trigger": "manual", "vm_snapshot_id": "stub-snap-a1b2c3d4", "db_snapshot_id": "stub-db-snap-e5f6g7h8", "quiesced": true, "pinned": true, "size_bytes": 0, "notes": "before risky refactor", "deploy_id": null, "created_at": "2026-06-26T14:32:15Z" }, "safety_checkpoint": { "checkpoint_id": "cp-safety-xyz", "ref": "abc999", "trigger": "pre-rollback", "vm_snapshot_id": "stub-snap-xyz", "db_snapshot_id": null, "quiesced": true, "pinned": false, "size_bytes": 0, "notes": "safety checkpoint before rollback to cp-a1b2c3d4", "deploy_id": null, "created_at": "2026-06-26T14:35:00Z" }, "stages": [ { "stage": "safety-checkpoint", "status": "ok", "ts": "2026-06-26T14:35:00Z", "notes": "id=cp-safety-xyz" }, { "stage": "stop-unit", "status": "ok", "ts": "2026-06-26T14:35:01Z", "notes": "stub mode: would systemctl stop of-app-<slug>" }, { "stage": "vm-revert", "status": "ok", "ts": "2026-06-26T14:35:02Z", "notes": "snap=stub-snap-a1b2c3d4" }, { "stage": "db-restore", "status": "ok", "ts": "2026-06-26T14:35:03Z", "notes": "snap=stub-db-snap-e5f6g7h8" }, { "stage": "start-unit", "status": "ok", "ts": "2026-06-26T14:35:04Z", "notes": "stub mode: would systemctl start of-app-<slug>" }, { "stage": "health-check", "status": "ok", "ts": "2026-06-26T14:35:05Z", "notes": "stub mode: would curl localhost:<port>/" } ], "session_token": "user-abc123", "next": "To roll FORWARD (undo the rollback): rollback_app(app_id='abc-123-def', checkpoint_id='cp-safety-xyz')" }

The stages list is ordered and timestamped. A GUI can render it as an SSE-style progress strip without a second round-trip.

pin_checkpoint

Pin a checkpoint so the retention policy never prunes it. Use this for known-good baselines you want to keep around indefinitely.

pin_checkpoint(app_id="abc-123-def", checkpoint_id="cp-a1b2c3d4") → { "checkpoint_id": "cp-a1b2c3d4", "ref": "abc1234567890def", "trigger": "manual", "vm_snapshot_id": "stub-snap-a1b2c3d4", "db_snapshot_id": "stub-db-snap-e5f6g7h8", "quiesced": true, "pinned": true, "size_bytes": 0, "notes": null, "deploy_id": null, "created_at": "2026-06-26T14:32:15Z", "session_token": "user-abc123" }

delete_checkpoint

Remove a checkpoint. Fails with an error if the checkpoint is pinned, unpin first.

delete_checkpoint(app_id="abc-123-def", checkpoint_id="cp-a1b2c3d4") → { "deleted": true, "checkpoint_id": "cp-a1b2c3d4", "session_token": "user-abc123" }

REST endpoints

All endpoints are owner-scoped (get_optional_user plus X-Guest-Id header) and mirror the MCP surface for direct UI consumption.

MethodPathPurpose
POST/api/apps/{app_id}/checkpointsCreate a manual checkpoint with optional notes and pin flag
GET/api/apps/{app_id}/checkpointsList checkpoints (newest first) plus storage usage and retention policy
GET/api/apps/{app_id}/checkpoints/{cp_id}Fetch a single checkpoint by ID
DELETE/api/apps/{app_id}/checkpoints/{cp_id}Remove a checkpoint (returns 204 No Content; returns 409 if pinned)
POST/api/apps/{app_id}/checkpoints/{cp_id}/pinPin a checkpoint to exempt it from pruning
POST/api/apps/{app_id}/checkpoints/{cp_id}/unpinUnpin a checkpoint (eligible for pruning again)
POST/api/apps/{app_id}/checkpoints/{cp_id}/rollbackBi-directional rollback, returns staged progress
POST/api/apps/{app_id}/checkpoints/pruneManually trigger retention pruning (returns {deleted: [...], kept: count})

Checkpoint model fields

Each checkpoint record includes these fields:

FieldTypeNotes
checkpoint_idstringUnique ID (cp-…)
refstring or nullGit SHA from the deployed code, or null if no deploy history
triggerstringOne of deploy, manual, agent, pre-rollback
vm_snapshot_idstringSnapshot ID (stub-snap-… in M1)
db_snapshot_idstring or nullDB snapshot ID if a managed database is attached
quiescedbooleanWhether fsfreeze succeeded (True in M1 stub)
pinnedbooleanExempt from retention pruning when True
size_bytesintegerDisk footprint (0 in M1, populated in M1.5)
notesstring or nullOptional freeform note
deploy_idstring or nullLinks to deploy history entry when trigger=‘deploy’
created_atstringISO 8601 timestamp

Retention policy

The default policy keeps:

  • The last 10 checkpoints regardless of age.
  • One checkpoint per day for the last 7 days (the oldest of each day wins).
  • Every pinned checkpoint, indefinitely.

Anything outside those buckets is eligible for pruning. Pruning runs implicitly after a successful create_checkpoint. Trigger it manually with the POST /api/apps/{app_id}/checkpoints/prune endpoint if you need to reclaim space immediately.

Putting it together

A typical agent loop looks like this:

  1. Agent calls create_checkpoint(app_id, notes="before risky refactor") to snapshot code SHA, VM disk, and DB state.
  2. Agent modifies the app, refactors code, runs a migration, redeploys.
  3. Agent calls walk_app to test the changes. The walk fails with a regression.
  4. Agent calls rollback_app(app_id, checkpoint_id). The service captures a pre-rollback safety checkpoint (preserving the failed state), then stages through stop-unit, vm-revert (restoring the VM disk), db-restore (restoring the DB), start-unit, and health-check.
  5. The GUI renders the six-stage progress directly from the returned stages list, no polling needed.
  6. The app is now at its pre-refactor state, serving from the same URL and gateway route.
  7. If the agent wants to keep debugging the failure, it calls rollback_app again with the safety checkpoint ID to roll forward into the failed state. Bi-directional recovery.

Notes

  • Stub today, real tomorrow. M1’s snapshot adapter fabricates IDs; the state transitions, retention, and rollback pipeline are real. M1.5 swaps in libvirt disk snapshots and DB dumps behind the same interface.
  • Quiesced snapshots. create_checkpoint stops the app unit briefly to capture a consistent disk and DB pair, then restarts it. The quiesced: true field on the response confirms this happened.
  • Pinned checkpoints cannot be deleted. Unpin first, or leave them pinned. That is the point of pinning.
  • pre-rollback checkpoints are normal checkpoints. They count against retention and can themselves be rolled back to, pinned, or deleted.
  • REST DELETE returns 204 No Content. There is no response body, only the status code.
  • App Deployment — register a Git repo as an app and get the public preview URL that rollback preserves.
  • Autonomous App Walkerwalk_app and scenario runs are what typically catch the regression that triggers a rollback.
  • Managed Databases — provision databases that get snapshotted as part of each checkpoint.
Last updated on