App Observability — Requests, uptime, events, and resource usage
Watch a deployed app from a single dashboard: request rollups (count, status classes, latency), HTTP uptime probes, a chronological event timeline (deploys, probe transitions, domain verifications, checkpoints), and per-VM resource allocation and usage. No VNC. No manual log digging. No credential management.
Today (M1): The data plane and REST surface are real. Analytics rollups persist (empty until Caddy ingest is wired in M1.5). Probe configs persist and survive restarts, but the actual HTTP probe loop is deferred to M1.5. Events are a real append-only log. Metrics return live libvirt allocation (vcpus, memory, disks); the 30-second sample ring buffer is deferred to M1.5. The logs SSE relay is deferred entirely to M1.5.
How it works
- Deploy the app — feature 02 registers a slug, assigns a VM, and brings the route up under the app gateway. The slug is the observability key.
- Turn on a probe (optional) —
PUT /api/app-gateway/probes/{slug}stores a probe config (path, interval, expected status). Once the probe loop ships in M1.5, transitions appendprobe_up/probe_downevents to the timeline. - Open the Observability panel — the GUI calls the analytics, probes, events, and metrics endpoints below and renders four tabs: Analytics, Uptime, Timeline, Resources.
REST endpoints
All endpoints sit under /api/app-gateway/ and require a bearer token (get_current_user). The slug is the same one used by deploy_app and the public preview URL.
Analytics — daily request rollups
GET /api/app-gateway/analytics/{slug}?days=7Returns one row per day with request count, status-class buckets (2xx, 3xx, 4xx, 5xx), p50 / p95 latency in milliseconds, and bytes out.
{
"slug": "my-shop",
"days": 7,
"rollups": []
}In M1 the list is empty until Caddy access-log ingest lands in M1.5. The schema, storage, and API contract are stable now so the GUI can build against them.
Probes — uptime checks
PUT /api/app-gateway/probes/{slug}
{
"path": "/healthz",
"interval_s": 60,
"expected_status": 200,
"timeout_s": 5,
"enabled": true
}GET /api/app-gateway/probes/{slug}{
"config": {
"path": "/healthz",
"interval_s": 60,
"expected_status": 200,
"timeout_s": 5,
"enabled": true
},
"last_run_at": null,
"last_status": null,
"last_http_status": null,
"last_error": null,
"consecutive_failures": 0
}DELETE /api/app-gateway/probes/{slug}Config persists across restarts in M1. last_status stays null until the probe loop ships in M1.5. At that point each transition also appends a probe_up / probe_down event to the timeline.
Events — append-only timeline
GET /api/app-gateway/events/{slug}?limit=100Returns events newest-first. Kinds include deploy_started, deploy_succeeded, deploy_failed, probe_up, probe_down, domain_verified, checkpoint_created, and walk_completed. Each event carries a severity (info, warning, error), a human-readable message, and a free-form metadata blob.
{
"slug": "my-shop",
"events": [
{
"event_id": "evt-a1b2c3d4e5f6",
"slug": "my-shop",
"kind": "deploy_succeeded",
"at": "2026-06-26T17:42:11+00:00",
"severity": "info",
"message": "Deployed commit a58e5a0 to app-vm",
"metadata": {
"commit": "a58e5a0",
"vm_name": "app-vm"
}
}
]
}POST /api/app-gateway/events/{slug}
{
"kind": "probe_down",
"severity": "warning",
"message": "Manual test event",
"metadata": {
"source": "operator"
}
}Manual POST is useful for testing the timeline UI before the probe loop or deploy hooks fill it organically. Note: the field name is at (ISO 8601 timestamp), not ts. The severity enum accepts only info, warning (not warn), or error.
Metrics — allocation and samples
GET /api/app-gateway/metrics/{slug}?vm_name=app-vm{
"allocation": {
"vcpus": 4,
"memory_mb": 8192,
"disks": [
{
"target": "vda",
"capacity_gb": 40
}
]
},
"samples": []
}allocation is live from libvirt and accurate today. samples is a 30-second ring buffer (cpu_pct, mem_pct, disk_pct) that the M1.5 sampler will backfill from libvirt domain stats.
Putting it together
- Deploy an app via App Deployment. The route comes up under the app gateway and the slug is registered.
- Enable a probe:
M1 stores the config. Actual probing runs in M1.5.
PUT /api/app-gateway/probes/my-shop { "path": "/healthz", "interval_s": 60, "expected_status": 200 } - Open the app’s Observability panel in the GUI. The Analytics tab shows an empty rollup list until Caddy ingest lands.
- The Uptime tab shows the probe config with
last_status: nulluntil the probe loop runs. - Seed the Timeline tab by appending a test event:
The event appears newest-first.
POST /api/app-gateway/events/my-shop { "kind": "probe_down", "severity": "warning", "message": "test" } - The Resources tab calls
GET /api/app-gateway/metrics/my-shop?vm_name=app-vmand shows the VM’s realvcpus,memory_mb, and disk allocation. The samples chart stays empty until the M1.5 sampler runs.
Notes
- Slug is the key. Every endpoint keys on the same slug used by
deploy_appand the public URL. - Stable contract. Analytics, probe-state, events, and metrics shapes are frozen for M1. M1.5 fills in the data without changing the schema.
- Events are append-only. There is no edit or delete. Treat them as an audit log.
- Probes do not run yet.
PUTstores config,GETreflects it, andDELETEremoves it. No HTTP traffic leaves the gateway in M1. - Logs are deferred. The SSE log relay (live
tail -fover the browser) ships in M1.5.
Related
- App Deployment — get a slug and a public URL to observe.
- App UI Testing — drive scenarios whose runs append events to the timeline.
- MCP Integration — set up the OpenFactory MCP server.