Page 5 of 6

Challenge 3: Test, Ship, Repeat¶

Recap:

In Challenge 2, you used skills to bring consistency to your application: type-specific forms that follow the same patterns, a decomposed backlog that kept your team organized, and manual review that caught real failures against specific acceptance criteria. The platform works, and it looks like a coherent system rather than a collection of one-off features.

Then in Lesson 3, you solved the problem that was nagging you: manual review doesn't scale. You learned why the two-week cliff happens (one change silently breaks something you already verified) and you saw the fix. Your acceptance criteria in Given/When/Then format are already test specifications. You handed them to AI, it generated automated tests, and you watched them run. You experienced the closed loop: criteria → test → fail → implement → pass. And you deployed your application to a live URL, with tests gating deployment and verified work going live.

You also discussed how to make the TDD cycle the default, using a skill or your project context file so that AI follows red-to-green automatically when you hand it a story. That idea is your starting point for this challenge.

The Challenge¶

Your Dark Vessel Risk Assessment Tool has AIS traffic, sanctions screening, gap event history, and vessel profiles. You can identify sanctioned vessels and investigate their behavioral patterns. Now add a layer that changes the game: satellite imagery. Until now, every signal in your tool depends on the vessel cooperating: broadcasting AIS, appearing in a database. Satellite detections catch vessels that are actively hiding.

This is where automated testing pays off. The detection logic you are about to build involves thresholds (how close is "close enough" to count as a match?), time windows (a satellite pass and an AIS position have to be near each other in time), and quality filters (not every light at sea is a vessel). These are exactly the kind of rules that break silently when you change something else. Build with the closed loop: acceptance criteria first, failing test, implement, passing test. The safety net grows with every feature.

Write the test first. Implement second. Redeploy when tests are green.

New Data Source¶

Challenge 3 introduces the satellite layer:

Nighttime light detections (nighttime-light-detections.csv): VIIRS (Visible Infrared Imaging Radiometer Suite) is a sensor on weather satellites that can detect lights on the ocean surface at night. The real data is published by the Earth Observation Group at the Colorado School of Mines. Your repository includes manufactured detections in the exact VIIRS format, temporally aligned with your AIS data and designed to include dark vessel scenarios worth finding alongside normal vessel lights, gas flares, and sensor noise. Each detection has a quality flag (QF_Detect): 1 means strong boat signal, 2 means weak boat, 3 means blurry, 4 means gas flare (offshore oil platform), 5 means sensor noise. Filter to QF 1-3 for vessel candidates.

The analytical question: when the satellite sees a light where a vessel is broadcasting AIS, that is a confirmed match. When it sees a light where NO vessel is broadcasting, something is there that does not want to be seen. That is a dark vessel detection.

Before building, take the Explore step. Ask your AI coding assistant: "What is VIIRS and how does satellite-based vessel detection work? What do the quality flags mean, and why would we filter to QF 1 through 3?" Then explore the data: "Parse the nighttime light detection CSV and show me what a single detection looks like. What fields would I use to match a detection against an AIS position?"

What to Build¶

Baseline CapabilitiesStretch Goals

Items are listed in priority order. If time is tight, focus on the items near the top first.

The closed loop is your default workflow: at least two features built using the full cycle: acceptance criteria, generate failing test, implement, tests pass. If you discussed making TDD the default in Lesson 3 (via a skill or project context update), put that into practice now.
VIIRS-to-AIS correlation: the analyst can see which satellite detections match known vessel positions and which are unmatched. This requires defining a correlation threshold (how close in distance and time counts as a "match"?) and applying it across both datasets. The threshold decisions are acceptance criteria: testable, specific, and verifiable.
Core features have automated test coverage: sanctions screening, gap event history, and at least one Challenge 2 feature have automated tests that run and pass
New feature added with regression confidence: add a substantial feature and verify that all existing tests still pass after the change. The safety net catches what manual review would miss.
Application redeployed with latest work: the live URL reflects everything you've built through Challenge 3, shipped because tests passed

These are options for teams that finish the baseline capabilities. Your team can also define your own stretch goals based on what interests you. Use the Explore step to brainstorm: ask your AI coding assistant about satellite-based vessel detection, what patterns analysts look for, and how confidence scoring works in intelligence analysis. If you finished earlier challenges without completing all their stretch goals, consider going back to pick up features from those lists as well.

Dark vessel alert layer: surface unmatched satellite detections (lights at sea where no vessel is broadcasting AIS) as potential dark vessel findings. The analyst can see where something was detected with no corresponding AIS signal, a strong indicator that a vessel has gone dark deliberately.
Confidence scoring for satellite matches: not every correlation is equally certain. A strong satellite detection (QF 1) close to an AIS-dark vessel's last known position is high confidence. A blurry detection (QF 3) far from any known position is ambiguous. Consider scoring each match by how confident the evidence is, and let the analyst see both: high-confidence dark vessel findings for immediate attention, and lower-confidence detections flagged for the analyst to review and decide whether they warrant investigation. This creates a triage workflow where the tool does the heavy lifting but the analyst makes the final call.
Satellite detection filtering with tests: the raw VIIRS data includes gas flares from offshore oil platforms (QF 4), sensor noise (QF 5), and blurry detections alongside real vessel lights. The analyst needs clean results, not thousands of raw detections. Build filtering that removes non-vessel detections and only surfaces candidates worth investigating, with automated tests verifying that gas flares and noise are excluded, that vessel candidates within your distance threshold are classified correctly, and that the analyst sees only what matters.
Temporal correlation view: show satellite passes and AIS positions aligned by time so the analyst can see what was broadcasting and what was dark during each satellite overpass. The data covers 5 nights of satellite passes across 52 orbits.
Multi-source vessel dossier: consolidate everything the tool knows about a single vessel into one view: AIS status, OFAC match, gap history, vessel profile findings, and satellite correlation results. The analyst selects a vessel and sees the complete picture.

Tips

Make the closed loop automatic first. If you discussed a TDD skill or context file update in Lesson 3's team discussion and didn't yet create it, do so now, before you start building features. A simple addition to your project context file works: "When implementing a user story, always write a failing test from the acceptance criteria first, then implement until the test passes, then run the full test suite to check for regressions." Every feature you build after that benefits from the pattern.
Write the test first, even when it feels backward. Seeing the test fail before implementation confirms you're testing the right thing. If the test passes before you've built anything, it's not testing what you think.
Your acceptance criteria are already test blueprints. The Given/When/Then format maps directly. "Given" becomes the test setup, "When" becomes the action, "Then" becomes the assertion. When AI generates a test from your criteria, check that each part is represented.
Correlation thresholds are your most testable decisions. "A satellite detection within 10 nautical miles and 2 hours of an AIS position counts as a match" is a precise, testable rule. Write it as an acceptance criterion, generate a test, and let the test verify your detection logic. If you change the threshold later, the test tells you what that change affects.
The CSV has 50 columns, but you need about 5. Don't try to parse everything. Start with Lat_DNB, Lon_DNB (position), Rad_DNB (brightness), QF_Detect (quality flag), and Date_Mscan (observation time). Ask your AI coding assistant which columns matter for vessel detection and which you can ignore.
When a test fails, check the error message. A good failure message tells AI exactly what went wrong. "Expected 3 unmatched detections but found 7" is actionable. A vague message like "assertion failed" leaves AI guessing. If your test failures aren't clear, ask AI to improve them.
Redeploy after each batch of green tests. Keep the live URL current. Tests pass, run Save & Sync, redeploy. Build the rhythm: verify it, then ship it.