Case Study

Case Study: Building LGTM's AI-Powered PR Validation Engine

July 22, 20257 min readLGTMDeveloper Tools

How we built looksgoodtome.ai — a developer tool that reads PR diffs, identifies affected user journeys, spins up Playwright sessions, and posts merge confidence reports. A look at the architecture behind zero-config, ephemeral test validation.

The pitch for LGTM was compelling: what if every pull request came with a confidence report, automatically? No test suites to write. No flaky tests to maintain. Just push a PR and get an AI-generated validation that tells you whether your changes broke any user-facing flows. The founding team had the vision. We had to build the engine.

The architecture breaks down into three phases: triage, validation, and reporting. When a PR is opened, the triage phase reads the diff and uses static analysis combined with AI to determine which user-facing journeys could be affected by the changes. This is the hardest part — mapping code changes to user behavior requires understanding the application's structure, not just the syntax.

For triage, we use a lightweight model to quickly categorize changes (UI component, API route, business logic, configuration) and map them to a graph of user journeys. The AI reads file paths, import trees, and component hierarchies to figure out what a user would see or do differently. This diff-aware approach means LGTM only validates what changed — not the entire app.

The validation phase spins up ephemeral Playwright sessions against the PR's preview deployment. Each affected journey gets its own browser session. The AI generates navigation steps dynamically — it doesn't rely on pre-written selectors or scripts. It navigates like a real user: clicking buttons, filling forms, waiting for responses. Every step captures screenshots, and the full session is recorded on video.

The reporting phase synthesizes validation results into a merge confidence report posted directly as a PR comment. Each journey gets a pass/fail status with screenshots and reasoning. If something looks wrong, the report explains what the AI expected to see versus what actually happened. Developers get the context they need to decide whether to merge without leaving GitHub.

The ephemeral-by-design philosophy was a deliberate architectural choice. Nothing persists between runs — no test files, no selectors to update, no suite to maintain. Every validation is generated fresh from the diff. This eliminates the entire category of flaky-test problems that plague traditional E2E suites. LGTM is now in early access, validating PRs for teams shipping web applications across every major framework.

Ready to build something similar?

We'd love to hear about your project. Let's discuss how we can deliver the same kind of results for your team.

Start a Project

The RAG Pipeline Playbook: Lessons from Building Production AI

Why Mobile-First Design Still Wins in 2025