Skip to content
Page 4 of 6

Manual Review: Your Eyes Are the Quality Gate

From Delegation to Verification

In Lesson 1, you learned the Explore → Plan → Implement → Verify workflow. The first three steps are about getting AI to build the right thing. The fourth step, Verify, is where you take ownership of quality.

The four-step workflow - Explore, Plan, Implement, Verify

This section is about that fourth step. You've delegated the work. AI delivered something. Now: does it actually satisfy the acceptance criteria you wrote?

Not "does the code look right." Not "does it seem to work." Specifically: does each acceptance criterion pass or fail?

The Spinning Loop

Here's what happens when people skip acceptance criteria or write vague ones:

  1. Ask AI to build something
  2. Get output that's... okay?
  3. Give vague feedback: "That's not quite right"
  4. AI makes changes
  5. Now it's different but still not what you wanted
  6. Re-prompt: "Closer, but..."
  7. Repeat

This is the spinning loop: re-prompting in circles because you never defined what "done" looks like. The problem isn't AI. It's the missing acceptance criteria. Without them, both you and AI are guessing at the target.

The fix is upstream. Write acceptance criteria BEFORE you delegate (the Plan step from Lesson 1). Then review AGAINST those criteria after AI delivers (the Verify step). No guessing. No "does this feel right." Pass or fail.

Criteria-Based Review

Manual review against acceptance criteria is straightforward:

  1. Pull up your acceptance criteria: the Given/When/Then statements you wrote before delegating
  2. Walk through each one: test the actual output against each criterion
  3. Mark each pass or fail: no partial credit. Either it satisfies the criterion or it doesn't
  4. For failures, be specific: "AC 3 fails: when I select a category from the dropdown, the form fields specific to that category don't appear"

That specificity is the difference between productive feedback and the spinning loop. "It's not right" sends AI in circles. "AC 3 fails because [specific missing behavior]" gives AI exactly what to fix.

Here's the pattern that makes this work: your acceptance criteria do double duty. In the Plan step, they're your spec; they tell AI what to build. In the Verify step, they're your checklist; they tell you what to check. Same criteria, two purposes. You already wrote them in Lesson 1. Now you use them as your verification tool.

What to Look For

When reviewing AI output against acceptance criteria, focus on three things:

Does it do what the AC says? Not what you imagine, not what would be nice, but what the acceptance criteria specifically state. If the AC says "items display in reverse chronological order" and they display in alphabetical order, that's a fail, even if alphabetical might be useful.

Does it handle the "Given" conditions? The Given clause establishes the starting state. If the AC says "Given I'm on the form" and the feature only works from the feed view, that's a fail.

Does it stop where the AC stops? If AI added features you didn't ask for, that's scope creep. It might be helpful, but it's still outside the contract. Review what you asked for first, then decide whether to keep the extras.

Review Against Criteria

Round Robin | ~6 minutes total | Everyone looks at the same screen. Open your application from Challenge 1 in the browser.

Your team built features during Challenge 1. Now review one against these acceptance criteria, as if someone had written them before the work started:

Example Story:
As an ONI watch floor analyst, I want to see which vessels
in my area of interest are on the OFAC sanctions list
so that I can prioritize them for investigation.

Acceptance Criteria 1:

Given the traffic display is loaded, when vessels with
MMSIs matching the OFAC list are present, then those
vessels are visually distinct from unsanctioned traffic.

Acceptance Criteria 2:

Given I see a sanctioned vessel, when I select it, then
I see the full OFAC entry including the vessel's listed
name, sanctions program, and all known identifiers.

Acceptance Criteria 3:

Given I select a sanctioned vessel, when its AIS broadcast
name differs from its OFAC-listed name, then both names
are visible so I can see the mismatch.

Acceptance Criteria 4:

Given a vessel's MMSI does not appear in the OFAC list,
when I view that vessel, then no sanctions flag or
indicator is shown.

Step 1: One person takes the reviewer role. Walk through AC 1 out loud:

  • Read the criterion
  • Test the actual form in the browser
  • Call the result: "Pass" or "Fail, here's why"

Step 2: Rotate the reviewer role for each remaining AC. Each person takes one.

Step 3: For any failure, write a specific fix request using this format:

Acceptance Criteria (AC) Fails.
Expected: [what the AC says].
Actual: [what happened].
Fix: [specific change needed].

You'll almost certainly find mismatches. Your form was built through conversation, not against these specific criteria. That's the point: criteria you write before building produce different results than criteria you check after.

Discuss: How many of the four passed? How is reviewing against specific criteria different from just looking at the form and saying "looks good"? What happens when you have 10 features to review? Does this approach scale?

The Honest Tradeoff

Manual review works. It's disciplined, it's thorough, and it catches real problems. But it's also slow. Every feature gets reviewed by hand. Every acceptance criterion gets walked through individually. And when you add a new feature, you might break something you already verified, and you won't know unless you re-check everything.

That tension is real. It doesn't mean manual review is wrong. It means it's the starting point, not the finish line. In Lesson 3, you'll learn to turn your acceptance criteria into automated tests that check themselves. But the discipline starts here: manual, criteria-based, pass or fail.

Key Insight

Your acceptance criteria are both the specification (spec) and the test. Write them before delegating (the Plan step), then verify against them after AI delivers (the Verify step). When something fails, point to the specific criterion. That turns vague dissatisfaction into a clear fix target. Your eyes are the quality gate. They won't always be the only gate, but they're the one that matters most right now.