Iterate (PDCA)
Route: /iterate
Scenario: Your app’s UX scores have been declining in session recordings. You suspect the onboarding flow has friction but don’t know where. Instead of spending a day manually testing every screen, you queue a PDCA run — point it at your live URL, describe the goal, and let the autonomous agent crawl, critique, improve, and score the result. You come back in 20 minutes with a scored report and a list of what changed.
The Iterate page runs Plan → Do → Check → Act scoring loops autonomously. The agent crawls a URL, generates a quality critique across multiple dimensions (UX, performance, accessibility, etc.), attempts improvements, re-checks the score, and loops until either the target score is reached or the iteration limit is hit.
Understanding the score
Each run produces a final score from 0 to 1. This is the judge model’s assessment of the target URL against the goal you described. A score of 0.85 means the judge considers the URL 85% of the way to fully satisfying the goal.
The score timeline chart in the run drawer shows one bar per iteration. Increasing bars mean the agent is making real improvements. Flat or decreasing bars mean either the goal is too vague or the BYOK model is struggling — try a more specific goal or a stronger model.
Runs table
The left panel lists all runs, newest first. At a glance:
- Status badge —
queued,running,succeeded,failed - Progress — current / target iterations (e.g. “3/5”)
- Final score — the bar fills as the score improves across iterations
- Trigger / Abort buttons appear per row while the run is queued or running
An auto-refresh banner appears at the top when any run is active, polling every 4 seconds so you don’t need to manually refresh.
Run detail drawer
Click any run row to open the detail drawer.
Score timeline
A bar chart with one bar per iteration. Click any bar to jump to that iteration’s detail — its critique text, per-dimension scores, and cost.
Iterations list
Expand any iteration to see:
- Overall score bar for that iteration
- Per-dimension breakdown — separate scores for dimensions like UX clarity, navigation depth, content quality, error handling
- Critique text — the judge model’s written assessment of what it found
- Time elapsed + cost — how long and how expensive that iteration was
Succeeded run
When the run succeeds, a Copy critique to clipboard button appears. Paste this into a GitHub issue, Notion doc, or design brief as a structured improvement report.
Use ↻ Refresh in the drawer header to poll for the latest iteration data while a run is active. The drawer doesn’t auto-poll to avoid hammering the API during a long run.
Creating a run
- Click + New Run (top of page).
- On the New Run tab, fill in:
| Field | Guidance |
|---|---|
| Target URL | The live URL the agent will crawl — must be publicly reachable |
| Goal / instructions | Be specific: “Improve the onboarding flow for first-time mobile users — focus on step clarity and reducing friction” is better than “improve UX” |
| Max iterations | 3–5 is typical. More iterations = more cost, diminishing returns |
| Target score | 0.8–0.9 is a good range. 1.0 is rarely achievable and will exhaust iterations |
| Producer model | The model generating improvements — use a strong model (claude-4) for best results |
| Judge model | The model scoring results — can be a lighter model to save cost |
| Critic persona | The evaluation lens — “UX expert” vs “accessibility auditor” changes what the judge prioritises |
- Click Queue run. The run starts immediately if no other run is active.
Common tasks
Running a UX audit before a release
- Create a new run: target URL = your staging URL, goal = “Identify UX friction in the checkout flow”, max iterations = 3, target score = 0.8.
- Let it run (~10–15 min).
- Open the drawer → read the final critique → Copy to clipboard.
- File as a GitHub issue or share with your design team.
Comparing before and after a change
- Run a baseline before your change. Note the final score.
- Deploy your change. Run again with identical config.
- Compare the final scores and the per-dimension breakdown charts.
- A higher score on the “UX clarity” dimension confirms your change worked.
Debugging a failed run
- Click the failed run row.
- In the Summary section, read the
error_detailfield — it contains the specific failure message from the runner. - Common causes:
- LLM key not configured → go to Settings → BYOK tab
- Target URL unreachable → check the URL returns 200 without authentication
- Iteration limit hit with low score → rewrite the goal to be more specific or increase the target score threshold
API
GET /v1/admin/pdca?project_id=<pid>&limit=50
GET /v1/admin/pdca/<id>
POST /v1/admin/pdca { "target_url": "...", "goal": "...", "iterations_target": 5,
"target_score": 0.85, "primary_model": "claude-4-sonnet",
"judge_model": "claude-3-haiku", "project_id": "<pid>" }
DELETE /v1/admin/pdca/<id> (abort)
POST /v1/admin/pdca/<id>/trigger (manually trigger queued run)Related pages
- Drift scanner — check for regressions before starting an iteration loop
- Integration health — verify LLM keys are working before queuing a run
- Fix orchestrator — dispatch individual fixes outside the PDCA loop