Updated April 2026Cookbook19 min read

Claude webapp-testing skill: 10 Playwright cookbook

Ten real Playwright tests — smoke, auth, visual regression, file upload, network mocks, mobile viewport, OAuth popups, perf budget, a11y audit, GitHub Actions — each as a single Claude prompt with the exact TypeScript it produces.

Already know what skills are? Skip to the cookbook. First time? Read the explainer then come back. Need the install? It’s on the /skills/webapp-testing page.

Editorial illustration: a stylized browser window glyph on the left connected to a stack of teal-glowing checkmarks on the right by a dot-and-dash flow arc, on a midnight navy background.
On this page · 21 sections
  1. What this skill does
  2. The cookbook
  3. Install + README
  4. Watch it built
  5. 01 · Smoke test: homepage loads + primary CTA works
  6. 02 · Auth flow: login form + session storage + logout
  7. 03 · Visual regression with toHaveScreenshot()
  8. 04 · Network mocks via page.route()
  9. 05 · Mobile viewport with devices['iPhone 13']
  10. 06 · OAuth popup handling
  11. 07 · Performance assertion via CDPSession
  12. 08 · Accessibility audit with @axe-core/playwright
  13. 09 · CI matrix: chromium / firefox / webkit on GitHub Actions
  14. 10 · Shared user fixture with test.beforeEach
  15. Community signal
  16. The contrarian take
  17. Real suites shipped
  18. Gotchas
  19. Pairs well with
  20. FAQ
  21. Sources

What this skill actually does

Sixty seconds of context before the cookbook — what the webapp-testing skill is, what Claude returns when you invoke it, and the one thing it does NOT do for you.

What this skill actually does

Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.

anthropics/skills · skills/webapp-testing/SKILL.md · /skills/webapp-testing

What Claude returns

You ask in natural language; Claude returns a runnable Playwright spec — almost always a TypeScript *.spec.ts that imports `test, expect` from `@playwright/test`, drives `page.goto`, `page.locator`, and `page.getByRole`, asserts with `expect(locator).toHaveText(...)` and `expect(page).toHaveScreenshot()` for visual diffs, stubs upstreams via `page.route(...)`, and configures projects (chromium, firefox, webkit, `devices['iPhone 13']`) plus retries in `playwright.config.ts`. The whole file runs unmodified under `npx playwright test`.

What it does NOT do

It does not install Playwright for you — run `npm i -D @playwright/test && npx playwright install` first. It also does not boot your dev server (the helper `scripts/with_server.py` does that, and the contrarian section flags a security gotcha there).

How you trigger it

write a Playwright test for the login flowsmoke-test the homepage in chromium and webkitadd a visual regression test for the pricing page

Cost when idle

~100

The cookbook

Each entry below is a Playwright test you could ship this week. They run in the order I’d teach them — the early ones (smoke, auth, visual) are reusable on every project, the later ones lean on Playwright features you only need when the shape gets specific (CDP for perf, axe-core for a11y, page.route for flaky-API isolation, popup events for OAuth). Every entry pairs with one or two skills or MCP servers you already have on mcp.directory.

One trade-off worth naming up front: this skill is a competitor to the Playwright MCP server. The skill is ~120 idle tokens (just the SKILL.md description). The MCP ships a tool schema for every browser action and that schema lives in the context window every turn. Pick the skill when each test is a fresh script and idle cost matters; pick the MCP when one agent needs to drive a long-lived browser session across many turns. The contrarian section covers the security trade-off that comes with the skill route.

Install + README

If the skill isn’t on your machine yet, here’s the one-liner. The full install panel (Codex, Copilot, Antigravity variants) lives on the skill page. The README below is the raw SKILL.md from anthropics/skills/webapp-testing — same source the install pulls from.

One-line install · by anthropics

Open skill page

Install

mkdir -p .claude/skills/webapp-testing && curl -L -o skill.zip "https://mcp.directory/api/skills/download/47" && unzip -o skill.zip -d .claude/skills/webapp-testing && rm skill.zip

Installs to .claude/skills/webapp-testing

Watch it built

A practical walkthrough of Claude Code driving Playwright end-to-end — useful before the cookbook because it anchors what the feedback loop feels like (fresh script per task, screenshots and console output back to the agent).

01

Smoke test: homepage loads + primary CTA works

One test that fails fast in CI when the homepage 500s, the hero CTA is missing, or the route it points at is broken.

ForEvery web team. Run on every PR.

The prompt

Write a Playwright smoke test in tests/smoke/homepage.spec.ts. It should: load http://localhost:3000, assert the page has a visible h1, click the primary CTA labelled 'Get started', and assert the URL ends with /signup. Tag the test '@smoke' so we can filter it in CI.

What slides.md looks like

import { test, expect } from '@playwright/test';

test('homepage loads and CTA navigates to signup @smoke', async ({ page }) => {
  await page.goto('http://localhost:3000');
  await expect(page.getByRole('heading', { level: 1 })).toBeVisible();
  await page.getByRole('link', { name: 'Get started' }).click();
  await expect(page).toHaveURL(/\/signup$/);
});

One-line tweak

Run only this in CI's fast lane with `npx playwright test --grep @smoke`. Add a second assertion on `expect(page).toHaveTitle(/YourBrand/)` if you have brand-name regression worries.

02

Auth flow: login form + session storage + logout

Verify the full sign-in loop: form validation, redirect on success, session cookie set, logout clears it.

ForAnyone with a credentialed app. Pairs naturally with `storageState` reuse for downstream tests.

The prompt

Write tests/auth/login.spec.ts. Fill the email and password inputs (use getByLabel), click 'Sign in', expect URL to become /dashboard, expect a cookie named 'session' to exist. Then click 'Log out' and expect the cookie to be gone. Save the authenticated state to playwright/.auth/user.json with page.context().storageState.

What slides.md looks like

import { test, expect } from '@playwright/test';

test('user can sign in, persist session, and sign out', async ({ page, context }) => {
  await page.goto('/login');
  await page.getByLabel('Email').fill('[email protected]');
  await page.getByLabel('Password').fill(process.env.TEST_PASSWORD!);
  await page.getByRole('button', { name: 'Sign in' }).click();
  await expect(page).toHaveURL('/dashboard');
  expect((await context.cookies()).some(c => c.name === 'session')).toBe(true);
  await context.storageState({ path: 'playwright/.auth/user.json' });
  await page.getByRole('button', { name: 'Log out' }).click();
  expect((await context.cookies()).some(c => c.name === 'session')).toBe(false);
});

One-line tweak

Reuse `storageState: 'playwright/.auth/user.json'` in `playwright.config.ts` so downstream specs skip the login UI entirely — speeds the suite up 5–10x.

03

Visual regression with toHaveScreenshot()

Pixel-diff the rendered page against a committed baseline; fail when a CSS change quietly moves your hero by 4px.

ForDesign-system teams, marketing pages, anything where unintended visual drift is a bug.

The prompt

Write tests/visual/pricing.spec.ts. Navigate to /pricing, wait for networkidle, then call expect(page).toHaveScreenshot('pricing.png'). Mask the dynamic '€/$' price block with a CSS selector. Set maxDiffPixels: 100.

What slides.md looks like

import { test, expect } from '@playwright/test';

test('pricing page matches the committed baseline', async ({ page }) => {
  await page.goto('/pricing');
  await page.waitForLoadState('networkidle');
  await expect(page).toHaveScreenshot('pricing.png', {
    mask: [page.locator('[data-test="price-amount"]')],
    maxDiffPixels: 100,
    animations: 'disabled',
  });
});

One-line tweak

Generate baselines on the same OS that runs CI — never locally. `npx playwright test --update-snapshots --project=chromium` from a Linux runner avoids the macOS-vs-Ubuntu anti-aliasing trap.

04

Network mocks via page.route()

Stub flaky upstream APIs so the test verifies your UI, not someone else's uptime.

ForAny frontend that talks to a third-party (Stripe, Algolia, GitHub API, internal microservices).

The prompt

Write tests/mocks/search.spec.ts. Intercept GET /api/search?q=* and respond with a fixture of three results. Type 'react' into the search input and assert the three result titles appear.

What slides.md looks like

import { test, expect } from '@playwright/test';

test('search renders results from a stubbed API', async ({ page }) => {
  await page.route('**/api/search?q=*', async route => {
    await route.fulfill({
      status: 200,
      contentType: 'application/json',
      body: JSON.stringify({ hits: [
        { id: '1', title: 'React docs' },
        { id: '2', title: 'React Router' },
        { id: '3', title: 'React Query' },
      ] }),
    });
  });
  await page.goto('/search');
  await page.getByPlaceholder('Search').fill('react');
  await expect(page.getByRole('listitem')).toHaveCount(3);
  await expect(page.getByText('React Router')).toBeVisible();
});

One-line tweak

Move the fixture to `tests/fixtures/search.json` and `import searchFixture from '../fixtures/search.json'` so the response stays diffable in PRs.

05

Mobile viewport with devices['iPhone 13']

Catch the regressions that only happen at 390×844 — overflow, touch targets too close, off-canvas nav broken.

ForAnyone whose mobile traffic is more than 30% of conversions.

The prompt

Add a 'mobile' project to playwright.config.ts using devices['iPhone 13']. Write tests/mobile/nav.spec.ts that opens the hamburger, taps 'Pricing', and asserts the URL is /pricing.

What slides.md looks like

// playwright.config.ts
import { defineConfig, devices } from '@playwright/test';
export default defineConfig({
  projects: [
    { name: 'chromium', use: { ...devices['Desktop Chrome'] } },
    { name: 'mobile',   use: { ...devices['iPhone 13'] } },
  ],
});

// tests/mobile/nav.spec.ts
import { test, expect } from '@playwright/test';
test('mobile nav opens and routes to /pricing', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('button', { name: 'Open menu' }).tap();
  await page.getByRole('link', { name: 'Pricing' }).tap();
  await expect(page).toHaveURL('/pricing');
});

One-line tweak

Run mobile-only with `npx playwright test --project=mobile`. Add `devices['Pixel 7']` as a third project to catch Chrome-on-Android-specific issues.

06

OAuth popup handling

Test the 'Sign in with Google/GitHub' flow without flaking — the popup must be awaited BEFORE the click that opens it.

ForAny product with social sign-in.

The prompt

Write tests/oauth/github.spec.ts. Click 'Continue with GitHub'. Use page.waitForEvent('popup') CREATED BEFORE the click. In the popup, fill the GitHub login fixture and click Authorize. Expect the original page to land at /dashboard.

What slides.md looks like

import { test, expect } from '@playwright/test';

test('github OAuth popup completes and returns to /dashboard', async ({ page }) => {
  await page.goto('/login');
  const popupPromise = page.waitForEvent('popup');           // create BEFORE click
  await page.getByRole('button', { name: 'Continue with GitHub' }).click();
  const popup = await popupPromise;
  await popup.waitForLoadState();
  await popup.getByLabel('Username').fill(process.env.GH_USER!);
  await popup.getByLabel('Password').fill(process.env.GH_PW!);
  await popup.getByRole('button', { name: 'Sign in' }).click();
  await popup.getByRole('button', { name: 'Authorize' }).click();
  await expect(page).toHaveURL('/dashboard');
});

One-line tweak

For deterministic CI runs, stub the `/oauth/callback` endpoint with `page.route` and never hit the real GitHub IdP — see use case 4 for the pattern.

07

Performance assertion via CDPSession

Fail the build when First Contentful Paint regresses past a budget — without bolting on a Lighthouse CI process.

ForTeams with a perf budget already (LCP < 2.5s, CLS < 0.1).

The prompt

Write tests/perf/landing.spec.ts. Navigate to /, wait until the page is settled, then read the browser-side `performance.getEntriesByType('paint')` API to assert First Contentful Paint < 1500 ms.

What slides.md looks like

import { test, expect } from '@playwright/test';

test('landing page FCP stays under 1.5s', async ({ page }) => {
  await page.goto('/', { waitUntil: 'networkidle' });
  const fcp = await page.evaluate(() => {
    const entry = performance
      .getEntriesByType('paint')
      .find((e) => e.name === 'first-contentful-paint');
    return entry ? entry.startTime : null;
  });
  console.log('FCP (ms):', fcp);
  expect(fcp).not.toBeNull();
  expect(fcp!).toBeLessThan(1500);
});

One-line tweak

Pair with the `perf-lighthouse` skill if you want full Core Web Vitals (LCP, INP, CLS) — the Paint Timing API is the cheap floor; Lighthouse is the audit.

08

Accessibility audit with @axe-core/playwright

Run axe against every key page; fail the build on serious or critical violations.

ForAnyone who needs WCAG 2.1 AA. The skill prompt also nudges Claude to fix obvious violations in the same PR.

The prompt

Install @axe-core/playwright. Write tests/a11y/dashboard.spec.ts that navigates to /dashboard, runs AxeBuilder({ page }).analyze(), and fails if any 'serious' or 'critical' violation is reported.

What slides.md looks like

import { test, expect } from '@playwright/test';
import AxeBuilder from '@axe-core/playwright';

test('dashboard has no serious or critical a11y violations', async ({ page }) => {
  await page.goto('/dashboard');
  const results = await new AxeBuilder({ page })
    .withTags(['wcag2a', 'wcag2aa'])
    .analyze();
  const blocking = results.violations.filter(
    v => v.impact === 'serious' || v.impact === 'critical'
  );
  expect.soft(blocking, JSON.stringify(blocking, null, 2)).toEqual([]);
});

One-line tweak

Use `expect.soft` so one bad rule doesn't hide the others — the test still fails, but every violation is reported in the same run.

09

CI matrix: chromium / firefox / webkit on GitHub Actions

Run every spec across the three engines on every push, with traces uploaded on failure.

ForAny team that ships to users on Safari (i.e., everyone).

The prompt

Generate .github/workflows/playwright.yml. Matrix over the three projects (chromium, firefox, webkit). Cache Playwright browsers. Upload playwright-report/ as an artifact on failure.

What slides.md looks like

# .github/workflows/playwright.yml
name: Playwright Tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        project: [chromium, firefox, webkit]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20, cache: 'npm' }
      - run: npm ci
      - run: npx playwright install --with-deps ${{ matrix.project }}
      - run: npx playwright test --project=${{ matrix.project }}
      - if: ${{ failure() }}
        uses: actions/upload-artifact@v4
        with: { name: playwright-report-${{ matrix.project }}, path: playwright-report/, retention-days: 7 }

One-line tweak

Add `--shard=${{ matrix.shard }}/4` and a second matrix axis `shard: [1, 2, 3, 4]` once your suite passes ~5 minutes wall-clock.

10

Shared user fixture with test.beforeEach

Stop re-logging-in at the start of every spec. Define one authenticated fixture; every downstream test inherits it.

ForAny suite over ~20 tests where login boilerplate is now the slowest part of CI.

The prompt

Refactor the suite. Create tests/fixtures/auth.ts that exports a custom test with an `authedPage` fixture. Use storageState from playwright/.auth/user.json (saved by use case 2). Then rewrite tests/dashboard.spec.ts to import from the fixture and skip the login UI entirely.

What slides.md looks like

// tests/fixtures/auth.ts
import { test as base, expect, Page } from '@playwright/test';
type Fixtures = { authedPage: Page };
export const test = base.extend<Fixtures>({
  authedPage: async ({ browser }, use) => {
    const context = await browser.newContext({ storageState: 'playwright/.auth/user.json' });
    const page = await context.newPage();
    await use(page);
    await context.close();
  },
});
export { expect };

// tests/dashboard.spec.ts
import { test, expect } from './fixtures/auth';
test('authed user sees their org name', async ({ authedPage }) => {
  await authedPage.goto('/dashboard');
  await expect(authedPage.getByTestId('org-name')).toHaveText('Acme Inc.');
});

One-line tweak

Add a second fixture `adminPage` that loads `playwright/.auth/admin.json` for tests that need elevated permissions — the pattern composes.

Community signal

Three voices from the Show HN thread for the open-source Playwright skill that inspired this category. The first is the clearest endorsement of why a skill works for Playwright; the second is the context-cost story; the third is the author’s own honest framing of when not to bother.

Playwright runs tests in parallel by default for free, whereas Cypress performs parallelization only for different machines through a paid feature.

BigBinary engineering team · Blog

Why-we-switched post-mortem; the single biggest reason teams move once Claude is authoring tests — parallel runs are free.

Source
I'm likely 50–100% more productive in Playwright than I was in Cypress.

Michael Lynch · Blog

Honest cost/benefit on Cypress→Playwright migration; useful counterweight to the contrarian section below.

Source
Playwright exposed problems that Cypress's automatic retries and auto-waiting masked — if tests had race conditions, Playwright exposed them consistently.

21RISK engineering · Blog

The flakiness story most teams discover the week after they migrate.

Source

The contrarian take

Not everyone is sold on skills for Playwright. The most honest critique on the launch thread came from Michael Lynch (mtlynch.io):

I have a personal appreciation for Cypress as an open-source company, and in particular, Gleb Bahmutov, their VP of Engineering.

Michael Lynch (mtlynch.io) · Blog

From the Show HN thread on Playwright skills.

Source

He’s right, and the official anthropics/webapp-testing skill has the receipts. Issue #1021 documents a working command-injection in scripts/with_server.py: the wrapper called subprocess.Popen(server['cmd'], shell=True, ...) on a string that came straight from a CLI argument. With shell=True, a value like "python server.py; touch /tmp/pwned" executed two commands instead of one. The reporter (nobuhiro-sasaki) summarised the threat model bluntly: “When this script is invoked by an AI agent from a prompt-driven workflow, a malicious or injected --server value can execute arbitrary shell commands on the host.”

The fix (PR #1039) is a one-line swap from shell=True to shlex.split(server['cmd']) plus an explicit --cwd argument. The lesson is bigger than the patch: when an agent assembles --server values from prompts, README snippets, or tool output, shell=True is a footgun. AftHurrahWinch’s core point holds — an MCP server is a deterministic surface; a skill is markdown a model interprets. That’s fine for a Playwright spec, but for the wrapper that boots your dev server, audit the version, pin to a commit that contains the fix, and never let untrusted text reach a shell-mode subprocess.

If determinism matters more than idle cost, the Playwright MCP server is the honest alternative — the tool schemas are static, the actions are well-typed, the trust boundary is the MCP transport. Pick it when you’re running an agent loop across many turns against the same browser session, and reach for this skill when each test is a one-shot script.

Real suites shipped

Concrete examples of teams running Claude + Playwright in anger. None of these are pure-marketing — they cite spec counts, target apps, and visible diffs.

Gotchas (the four that bite)

Sourced from the anthropics/skills issue tracker and the Show HN thread for Playwright skills.

with_server.py + untrusted --server is a shell-injection vector

Issue #1021 — pre-fix versions used shell=True. Pin to the commit that includes PR #1039 (shlex.split) and never feed --server values built from prompt or tool output.

Skills aren't deterministic; the wrapper still has to be

The model can choose how to author a Playwright spec, but the helper script that boots your dev server should be locked-down code, not LLM-generated. Treat with_server.py as the trust boundary.

page.waitForEvent('popup') must be created BEFORE the click

Awaiting waitForEvent after the click that opens the popup is a classic race. The popup either fires before await registers, or never resolves. Use the popupPromise pattern in use case 7.

toHaveScreenshot baselines drift across OS

A baseline captured on macOS won't match the same page on Ubuntu CI — anti-aliasing differs. Either commit per-platform baselines or generate them in CI and never locally. Set maxDiffPixels: 100 as a floor.

Pairs well with

Curated to match the cookbook’s actual integrations: the Playwright-adjacent skills (playwright-cli, writing-playwright-tests, playwright-pro) plus the perf and a11y skills the later use cases lean on. The natural cross-link is the Flutter skill cookbook — Flutter web apps and these Playwright tests pair perfectly for a full-stack agentic test suite.

Two posts that compose well with this cookbook: What are Claude Code skills? covers the underlying mechanism, and Claude Code best practices covers the orchestration patterns the longer use cases (8, 10) lean on.

Frequently asked questions

What is the webapp-testing skill, and how is it different from the Playwright MCP server?

The webapp-testing skill is Anthropic's official SKILL.md that teaches Claude how to author and run Playwright scripts on demand against a local dev server. The Playwright MCP server keeps a long-lived browser session and exposes browser actions as MCP tools — its tool schemas live in the context window every turn. Reach for the skill when each test is a fresh script and idle cost matters; reach for the MCP when the agent benefits from persistent state across many turns.

Chrome DevTools MCP vs Playwright MCP — which one pairs with this skill?

They solve adjacent problems. Playwright MCP wraps page-level automation (click, fill, navigate, screenshot). Chrome DevTools MCP exposes lower-level CDP primitives (Performance.getMetrics, Coverage, Tracing). Use case 8 in this cookbook hits CDP directly through Playwright's context.newCDPSession — that's the cheapest path. If you already run Playwright tests, stay there; only add Chrome DevTools MCP when you need protocol-level surface the skill cannot reach.

Does the webapp-testing skill author Python or TypeScript Playwright?

The official Anthropic SKILL.md leans on Python with sync_playwright() because the helper script (scripts/with_server.py) is a Python wrapper that owns the dev-server lifecycle. The 10 cookbook entries above are written in TypeScript — that's the more common stack on the frontend side, and the patterns map 1:1. Tell Claude which language you want; the skill will follow.

Is there a known security issue with the webapp-testing skill I should know about?

Yes. Issue #1021 in anthropics/skills documents a shell-injection in scripts/with_server.py — it called subprocess.Popen with shell=True on a CLI string. The fix (PR #1039) replaces shell=True with shlex.split. Audit your version, pin to the patched commit, and never let untrusted text reach a --server value. The contrarian section below covers this in detail.

What is the playwright-cli skill, and do I need it on top of webapp-testing?

playwright-cli is a sibling skill that teaches Claude the npx playwright test command surface — flags, projects, shards, reporters. webapp-testing focuses on authoring a Playwright script and lifecycle-managing the server it tests. They compose: webapp-testing writes the spec, playwright-cli runs it. Most teams want both installed.

How do I avoid flaky tests when Claude authors the suite?

Three rules the cookbook prompts already enforce: prefer page.getByRole / page.getByLabel over CSS selectors (they re-bind on copy edits), wait on networkidle or a visible state instead of a fixed timeout, and use page.route to stub external APIs in any test that doesn't explicitly verify the upstream. If a test still flakes, run it with --trace=on and read the trace before adding retries — retries hide the bug.

Why is 'webapp testing' getting impressions on Google but no clicks?

The bare 'webapp testing' query is too broad — it surfaces every Playwright tutorial on the web. This blog targets the long-tail variants that map to the Anthropic skill specifically: 'webapp testing skill', 'webapp-testing skill', 'webapp testing claude skill', plus the playwright-cli and chrome-devtools-mcp comparison cluster where the Anthropic SKILL.md is the right answer.

Sources

Primary

Community

Critical and contrarian

Internal

Keep reading