---
title: "Show, Don't Describe: Analyzing UI Recordings with Claude Code"
date: "2026-04-28T00:00:00.000Z"
summary: "Text descriptions of UI bugs are lossy by nature. A screen recording is not. I built a Claude Code skill that extracts key frames from a recording, reads them directly into context, and produces a structured breakdown of what is happening and when."
tags:
  - "ai"
  - "claude-code"
  - "skills"
  - "workflow"
  - "tools"
authors:
  - "default"
---
Text descriptions of UI behavior are lossy by nature. While developing and building with AI I am often passing screenshots of an interface to the agent to communicate what is on screen in a high fidelity way, but when the scenario I want to communicate is dynamic or interactive the quick screnshot is insufficent.

"The modal flickers when you click the button" doesn't tell the agent where in the sequence the flicker happens, what state precedes it, what it looks like at the moment it occurs, or whether it's a flash of the control state before a test variation loads.

A 10-second screen recording is not lossy in the same way. Everything is there. The sequence, the timing, the visual state at each moment.

So I built a skill that lets me hand a recording to Claude Code and get a frame-by-frame analysis back.

---

## How it works

The skill uses FFmpeg to extract key frames from the recording using scene detection tuned for UI work. Standard film-editing thresholds miss subtle transitions, things like a modal opening, a spinner appearing, or a hover state changing. The threshold I landed on (0.1) catches those. It falls back to uniform 1fps sampling if the recording is visually static enough that scene detection doesn't fire.

It then reads every extracted frame directly into Claude's context, along with a contact sheet overview of the full sequence. From there it produces a structured breakdown: the starting UI state, what changes and when, the end state.

The first run surfaced something worth documenting: FFmpeg installed via Homebrew on macOS doesn't include libfreetype, so the `drawtext` filter for burning timestamps into frames fails silently unless you know to look for it. The skill now omits that filter entirely and notes why, so the next person using it doesn't spend time debugging a filter dependency.

---

## What it caught in the first real test

The first recording I fed it was a 10-second clip of a website showing a behavior that was hard to describe precisely. Within four frames, the skill identified it as a classic A/B test flicker: the control state renders first, then the testing platform's script fires, evaluates its targeting rules, and swaps in the variant. You can see the original layout briefly before the form appears.

That identification was immediate and specific enough to be actionable. The fix isn't in the code, it's in the test platform configuration. That's the kind of answer that takes 30 minutes to reach through text description and back-and-forth clarification.

---

## The iteration happened in the same session

One thing I find genuinely useful about building skills this way: they improve in real time. After the first run, I noticed two things that were wrong.

The temp files landed in `/tmp/`, which means they scatter outside the project and get cleaned up by the OS at unpredictable times. That's fine for a quick session but annoying in practice. The skill now writes frames to `screen-recordings/` inside the project root, so they stay in context and you know where to find them.

The focus hint (the optional second argument for telling the skill what to look for) was optional in the invocation but easy to forget. The skill now prompts for it interactively if you don't provide one. A short question before the FFmpeg work starts is worth it: "general overview" is a valid answer, but "why does the header shift at the moment the modal opens" produces a much more targeted analysis.

Both changes happened in the same conversation where the skill ran for the first time. That feedback loop is one of the things that makes skills worth building.

---

## When to use this versus a screenshot

I already have a skill for screenshot-grounded development (aieyes), which captures the live site at a specific viewport and reads it into context. That's the right tool when you have a static UI state you want to inspect.

Screen recording is for behavior. Anything that involves a sequence of states, a transition, a timing dependency, or something you'd otherwise describe with "when I click X, then Y happens" is a better fit for a recording than a screenshot.

Short clips work best. Under three minutes. If you have a longer recording, QuickTime's trim tool takes about 10 seconds to isolate the relevant segment, and the analysis is sharper for it.

---

## The skill is public

The skill is available at [rocksoup/videofeedback](https://github.com/rocksoup/videofeedback). It requires FFmpeg (`brew install ffmpeg`) and Claude Code. Copy `SKILL.md` into `.claude/skills/screen-recording/` and it registers automatically at next launch.

---

## More on AI-assisted development

- [Inspect Before You Ask: Wiring Chrome's Debug Protocol into Your AI Workflow](/blog/chrome-devtools-mcp-live-site-inspection). Using the Chrome DevTools MCP to query live site markup from inside Claude Code.
- [Project Briefing: A Small Skill for Getting Back Up to Speed](/blog/project-briefing-skill). A skill that turns tracker state and git history into a fast project handoff.
- [How I Control Claude Code from Slack Using the Channels MCP](/blog/slack-bridge-for-claude-code). Push messages into a running Claude Code session from Slack on your phone.

