Skip to Content
v0.8.0 · shippedNative iOS / Android / Flutter / Capacitor SDKs, A2A discovery, SOC 2 readiness, residency, BYO storage, BYOK. Read the changelog →
Admin consoleIterate (PDCA)
Live progress + draft-PR exit · · open live demo ↗

Iterate (PDCA)

Route: /iterate

Scenario: Your app’s UX scores have been declining in session recordings. You suspect the onboarding flow has friction but don’t know where. Instead of spending a day manually testing every screen, you queue a PDCA run — point it at your live URL, describe the goal, and let the autonomous agent crawl, critique, improve, and score the result. You come back in 20 minutes with a scored report and a list of what changed.

The Iterate page runs Plan → Do → Check → Act scoring loops autonomously. The agent crawls a URL, generates a quality critique across multiple dimensions (UX, performance, accessibility, etc.), attempts improvements, re-checks the score, and loops until either the target score is reached or the iteration limit is hit.


Understanding the score

Each run produces a final score from 0 to 1. This is the judge model’s assessment of the target URL against the goal you described. A score of 0.85 means the judge considers the URL 85% of the way to fully satisfying the goal.

The score timeline chart in the run drawer shows one bar per iteration. Increasing bars mean the agent is making real improvements. Flat or decreasing bars mean either the goal is too vague or the BYOK model is struggling — try a more specific goal or a stronger model.


Runs table

The left panel lists all runs, newest first. At a glance:

  • Status badgequeued, running, succeeded, failed
  • Progress — current / target iterations (e.g. “3/5”)
  • Final score — the bar fills as the score improves across iterations
  • Trigger / Abort buttons appear per row while the run is queued or running

An auto-refresh banner appears at the top when any run is active, polling every 4 seconds so you don’t need to manually refresh.


Run detail drawer

Click any run row to open the detail drawer.

Score timeline

A bar chart with one bar per iteration. Click any bar to jump to that iteration’s detail — its critique text, per-dimension scores, and cost.

Iterations list

Expand any iteration to see:

  • Overall score bar for that iteration
  • Per-dimension breakdown — separate scores for dimensions like UX clarity, navigation depth, content quality, error handling
  • Critique text — the judge model’s written assessment of what it found
  • Time elapsed + cost — how long and how expensive that iteration was

Succeeded run

When the run succeeds, a Copy critique to clipboard button appears. Paste this into a GitHub issue, Notion doc, or design brief as a structured improvement report.

Use ↻ Refresh in the drawer header to poll for the latest iteration data while a run is active. The drawer doesn’t auto-poll to avoid hammering the API during a long run.


Creating a run

  1. Click + New Run (top of page).
  2. On the New Run tab, fill in:
FieldGuidance
Target URLThe live URL the agent will crawl — must be publicly reachable
Goal / instructionsBe specific: “Improve the onboarding flow for first-time mobile users — focus on step clarity and reducing friction” is better than “improve UX”
Max iterations3–5 is typical. More iterations = more cost, diminishing returns
Target score0.8–0.9 is a good range. 1.0 is rarely achievable and will exhaust iterations
Producer modelThe model generating improvements — use a strong model (claude-4) for best results
Judge modelThe model scoring results — can be a lighter model to save cost
Critic personaThe evaluation lens — “UX expert” vs “accessibility auditor” changes what the judge prioritises
  1. Click Queue run. The run starts immediately if no other run is active.

Common tasks

Running a UX audit before a release

  1. Create a new run: target URL = your staging URL, goal = “Identify UX friction in the checkout flow”, max iterations = 3, target score = 0.8.
  2. Let it run (~10–15 min).
  3. Open the drawer → read the final critique → Copy to clipboard.
  4. File as a GitHub issue or share with your design team.

Comparing before and after a change

  1. Run a baseline before your change. Note the final score.
  2. Deploy your change. Run again with identical config.
  3. Compare the final scores and the per-dimension breakdown charts.
  4. A higher score on the “UX clarity” dimension confirms your change worked.

Debugging a failed run

  1. Click the failed run row.
  2. In the Summary section, read the error_detail field — it contains the specific failure message from the runner.
  3. Common causes:
    • LLM key not configured → go to Settings → BYOK tab
    • Target URL unreachable → check the URL returns 200 without authentication
    • Iteration limit hit with low score → rewrite the goal to be more specific or increase the target score threshold

API

GET /v1/admin/pdca?project_id=<pid>&limit=50 GET /v1/admin/pdca/<id> POST /v1/admin/pdca { "target_url": "...", "goal": "...", "iterations_target": 5, "target_score": 0.85, "primary_model": "claude-4-sonnet", "judge_model": "claude-3-haiku", "project_id": "<pid>" } DELETE /v1/admin/pdca/<id> (abort) POST /v1/admin/pdca/<id>/trigger (manually trigger queued run)

Last updated on