Iterate (PDCA)

Route: /iterate

Scenario: Your app’s UX scores have been declining in session recordings. You suspect the onboarding flow has friction but don’t know where. Instead of spending a day manually testing every screen, you queue a PDCA run — point it at your live URL, describe the goal, and let the autonomous agent crawl, critique, improve, and score the result. You come back in 20 minutes with a scored report and a list of what changed.

The Iterate page runs Plan → Do → Check → Act scoring loops autonomously. The agent crawls a URL, generates a quality critique across multiple dimensions (UX, performance, accessibility, etc.), attempts improvements, re-checks the score, and loops until either the target score is reached or the iteration limit is hit.

Understanding the score

Each run produces a final score from 0 to 1. This is the judge model’s assessment of the target URL against the goal you described. A score of 0.85 means the judge considers the URL 85% of the way to fully satisfying the goal.

The score timeline chart in the run drawer shows one bar per iteration. Increasing bars mean the agent is making real improvements. Flat or decreasing bars mean either the goal is too vague or the BYOK model is struggling — try a more specific goal or a stronger model.

Runs table

The left panel lists all runs, newest first. At a glance:

Status badge — queued, running, succeeded, failed
Progress — current / target iterations (e.g. “3/5”)
Final score — the bar fills as the score improves across iterations
Trigger / Abort buttons appear per row while the run is queued or running

An auto-refresh banner appears at the top when any run is active, polling every 4 seconds so you don’t need to manually refresh.

Run detail drawer

Click any run row to open the detail drawer.

Score timeline

A bar chart with one bar per iteration. Click any bar to jump to that iteration’s detail — its critique text, per-dimension scores, and cost.

Iterations list

Expand any iteration to see:

Overall score bar for that iteration
Per-dimension breakdown — separate scores for dimensions like UX clarity, navigation depth, content quality, error handling
Critique text — the judge model’s written assessment of what it found
Time elapsed + cost — how long and how expensive that iteration was

Succeeded run

When the run succeeds, a Copy critique to clipboard button appears. Paste this into a GitHub issue, Notion doc, or design brief as a structured improvement report.

Use ↻ Refresh in the drawer header to poll for the latest iteration data while a run is active. The drawer doesn’t auto-poll to avoid hammering the API during a long run.

Creating a run

Click + New Run (top of page).
On the New Run tab, fill in:

Field	Guidance
Target URL	The live URL the agent will crawl — must be publicly reachable
Goal / instructions	Be specific: “Improve the onboarding flow for first-time mobile users — focus on step clarity and reducing friction” is better than “improve UX”
Max iterations	3–5 is typical. More iterations = more cost, diminishing returns
Target score	0.8–0.9 is a good range. 1.0 is rarely achievable and will exhaust iterations
Producer model	The model generating improvements — use a strong model (claude-4) for best results
Judge model	The model scoring results — can be a lighter model to save cost
Critic persona	The evaluation lens — “UX expert” vs “accessibility auditor” changes what the judge prioritises

Click Queue run. The run starts immediately if no other run is active.

Common tasks

Running a UX audit before a release

Create a new run: target URL = your staging URL, goal = “Identify UX friction in the checkout flow”, max iterations = 3, target score = 0.8.
Let it run (~10–15 min).
Open the drawer → read the final critique → Copy to clipboard.
File as a GitHub issue or share with your design team.

Comparing before and after a change

Run a baseline before your change. Note the final score.
Deploy your change. Run again with identical config.
Compare the final scores and the per-dimension breakdown charts.
A higher score on the “UX clarity” dimension confirms your change worked.

Debugging a failed run

Click the failed run row.
In the Summary section, read the error_detail field — it contains the specific failure message from the runner.
Common causes:
- LLM key not configured → go to Settings → BYOK tab
- Target URL unreachable → check the URL returns 200 without authentication
- Iteration limit hit with low score → rewrite the goal to be more specific or increase the target score threshold

API


GET    /v1/admin/pdca?project_id=<pid>&limit=50
GET    /v1/admin/pdca/<id>
POST   /v1/admin/pdca   { "target_url": "...", "goal": "...", "iterations_target": 5,
                          "target_score": 0.85, "primary_model": "claude-4-sonnet",
                          "judge_model": "claude-3-haiku", "project_id": "<pid>" }
DELETE /v1/admin/pdca/<id>         (abort)
POST   /v1/admin/pdca/<id>/trigger (manually trigger queued run)

Drift scanner — check for regressions before starting an iteration loop
Integration health — verify LLM keys are working before queuing a run
Fix orchestrator — dispatch individual fixes outside the PDCA loop