zudo-test-wisdom
GitHub repository

Type to search...

to open search from anywhere

Playwright Patterns

E2E testing patterns with Playwright for CI and production verification.

CI-Safe vs @interactive Test Split

Not all E2E tests can run in CI. Tests requiring keyboard shortcuts, clipboard access, or desktop-specific interactions should be tagged and split:

// e2e/basic-navigation.spec.ts -- runs in CI
import { test, expect } from "@playwright/test";

test("loads the home page", async ({ page }) => {
  await page.goto("/");
  await expect(page.locator("h1")).toBeVisible();
});
// e2e/keyboard-shortcuts.spec.ts -- only runs locally
import { test, expect } from "@playwright/test";

test("@interactive Ctrl+S saves document", async ({ page }) => {
  await page.goto("/editor");
  await page.keyboard.press("Control+KeyS");
  await expect(page.locator(".save-indicator")).toHaveText("Saved");
});
// playwright.config.ts
import { defineConfig } from "@playwright/test";

export default defineConfig({
  projects: [
    {
      name: "ci",
      testMatch: /.*\.spec\.ts/,
      testIgnore: /.*@interactive.*/,
    },
    {
      name: "interactive",
      testMatch: /.*@interactive.*\.spec\.ts/,
    },
  ],
});

Tip

Run npx playwright test --project=ci in CI and npx playwright test --project=interactive locally when you need full keyboard/clipboard testing.

Guard against specs that match no project

The CI-safe / interactive split above uses a catch-all testMatch: /.*\.spec\.ts/ on the ci project, so every spec is collected by at least one project — the catch-all is exactly what prevents the trap described here. The trap only appears once you move to a partitioned setup where each project maps to a disjoint filename prefix (e.g. one project per fixture or app):

// playwright.config.ts — partitioned by filename prefix (NOT a catch-all)
import { defineConfig } from "@playwright/test";

export default defineConfig({
  projects: [
    { name: "fixtureA", testMatch: /fixtureA[^/]*\.spec\.ts/ },
    { name: "fixtureB", testMatch: /fixtureB[^/]*\.spec\.ts/ },
    { name: "fixtureC", testMatch: /fixtureC[^/]*\.spec\.ts/ },
  ],
});

In this config, a spec whose filename starts with anything other than fixtureA, fixtureB, or fixtureC matches no project and is collected by zero projects. Playwright runs without error — it simply has nothing to run — and no test failure reveals it. The missing spec produces no result to fail.

The fix mirrors the single-source-of-truth meta-test pattern: a tiny script that asserts every e2e spec filename starts with a known project prefix, wired into both the local pre-push gate and CI. Enumerate specs recursively with find (not ls e2e/*.spec.ts, which only globs the top level and would silently miss a spec nested in a subdirectory — the very hole this guard exists to close):

# Every e2e spec must start with a known project prefix, or it runs nowhere.
# find recurses into subdirectories; `ls e2e/*.spec.ts` would miss e2e/<dir>/*.spec.ts.
known='fixtureA|fixtureB|fixtureC'
bad=$(find e2e -type f -name '*.spec.ts' | grep -Ev "/($known)[^/]*\.spec\.ts$" || true)
[ -z "$bad" ] || { echo "specs match no Playwright project:"; echo "$bad"; exit 1; }

Warning

The guard script, not the config, is what makes a filename-prefix project split safe. A green Playwright run only proves that the specs which were collected passed — it cannot prove that every spec ran. If a spec falls outside every project's testMatch pattern, Playwright drops it silently. The guard is the only thing that catches a spec the config silently skipped.

Quarantining Flakes: The Retries-Asymmetry Trap

Beyond the CI-safe vs @interactive split, there is a third tag worth knowing: @flaky. It exists because of a subtle trap — CI and your local pre-push gate often run with different retry budgets, so a test can be green in one and red in the other.

The trap starts here:

// playwright.config.ts
import { defineConfig } from "@playwright/test";

export default defineConfig({
  // CI retries twice; local runs get zero retries.
  retries: process.env.CI ? 2 : 0,
});

With retries: 2 in CI, a test that passes on its second or third attempt is reported green. Run the exact same test on a local b4push gate with retries: 0 and it goes red on the first failure. The test did not change — only the retry budget did. This is the insight to internalize: "flaky" is gate-relative. A test is only as flaky as the strictest gate it has to clear.

When you have a known-flaky test that already lives on main, deleting it loses coverage. Instead, tag it @flaky in the title and quarantine it from the strict local gate without removing it:

# scripts/run-b4push.sh -- exclude @flaky from the strict local gate
CHROMIUM_INVERT="@interactive|@flaky"
WEBKIT_INVERT="@flaky"

# Chromium step: skip both @interactive and @flaky
pnpm test:e2e --project=chromium --grep-invert="$CHROMIUM_INVERT"

# WebKit @interactive step: run @interactive but still drop @flaky
pnpm test:e2e --project=webkit --grep="@interactive" --grep-invert="$WEBKIT_INVERT"

The Chromium step adds @flaky to its --grep-invert (alongside @interactive), and the WebKit @interactive step also excludes @flaky. The tests stay in the suite — CI still runs them and tolerates the occasional retry — but they no longer trip the zero-retry local gate.

Warning

@flaky is a quarantine, not a permanent skip. Tag only tests that are already known-flaky on main; never tag a brand-new test to make a gate pass. When you fix the underlying race, remove the tag in the same PR — otherwise the list silently grows and you lose real coverage.

Tip

Keep escape hatches for the local gate so a flaky machine never blocks a push: e.g. SKIP_E2E_WEBKIT=1 to skip just the WebKit pass, SKIP_E2E=1 to skip the whole E2E stage, and a RUN_FLAKY=1 opt-in to run the quarantined tests when verifying a fix.

test.skip as a Precondition — the pass-by-skip Trap

test.skip is for genuine environment dependencies: a test that only makes sense on a specific OS, or when a particular service is reachable. Even then, audit that your gold-standard CI hosts actually run the spec — a skip that fires on every environment you own is a permanent pass-by-skip: the suite reports green because the test never executed, not because the behaviour is correct. The test is broken, not flaky.

Preconditions that must always hold belong in hard assertions:

// Anti-pattern: silently skips when user is null, hiding a broken setup
test.skip(!user, "no user");

// Correct: fails loudly if setup is broken
expect(user).toBeTruthy();

See Decision Guide — When to Write a Heavy Test for the Step 0 gate that determines whether the precondition belongs in the test at all.

Editor Input in E2E

Driving a code editor (CodeMirror, Monaco, ProseMirror, or any contenteditable) from Playwright is harder than page.fill(). If the editor has a vim mode, page.keyboard.type("hello") is a disaster: the leading h moves the cursor left, i enters insert mode, and the rest is interpreted as commands rather than text.

The reliable approach is to select all existing content via the DOM Selection API, then push the new content with page.keyboard.insertText(). insertText dispatches a synthetic input event that the editor handles directly, bypassing vim-mode command interpretation entirely:

// e2e/helpers.ts
import type { Page } from "@playwright/test";
import { expect } from "@playwright/test";
import os from "os";

// Platform-aware modifier: Meta on macOS, Control on Linux/Windows
export const mod = os.platform() === "darwin" ? "Meta" : "Control";

export async function setEditorContent(page: Page, content: string) {
  const editor = page.locator(".cm-content");
  await editor.waitFor({ timeout: 5000 });
  await editor.click();

  // Select all content via the DOM Selection API (works regardless of vim mode)
  await page.evaluate(() => {
    const el = document.querySelector(".cm-content");
    if (!el) return;
    const range = document.createRange();
    range.selectNodeContents(el);
    const sel = window.getSelection();
    sel?.removeAllRanges();
    sel?.addRange(range);
  });

  // insertText dispatches an input event the editor handles directly,
  // bypassing vim-mode command interpretation entirely.
  await page.keyboard.insertText(content);

  // Wait for the Lezer parse + decoration updates to land before asserting.
  const firstLine = content.split("\n").find((l) => l.trim()) || content;
  await expect(page.locator(".cm-content")).toContainText(firstLine.slice(0, 20), {
    timeout: 5000,
  });

  // wait-ok: 500ms is the known auto-save debounce constant; split-pane reads
  // content back from the backend, so the test must wait >= the debounce or it races the persist.
  await page.waitForTimeout(500);
}

The platform-aware mod helper lets the same spec drive editor shortcuts on macOS (Meta) and Linux/Windows (Control) without branching in every test.

Warning

That waitForTimeout(500) is the legitimate exception to the usual "never use an arbitrary waitForTimeout" rule. An arbitrary wait is acceptable only when it is keyed to a known application constant — here, the 500ms auto-save debounce — and you document why using the // wait-ok: <why> marker. A bare waitForTimeout(500) with no rationale is still a flake waiting to happen; tie it to a real constant or replace it with a proper expect wait.

A second legitimate class: specs that assert the absence of a failure within a time window. For example, guarding against a React "Maximum update depth exceeded" startup loop — you mount the app and assert that no error fires for N ms. Converting that sleep to a condition wait guts the assertion: there is no positive event to poll for, so a poll resolves instantly and stops observing the window. Keep the sleep, name the constant, annotate why, and never convert:

const POST_MOUNT_LOOP_SETTLE_MS = 2000; test("no update-depth errors on startup", async ({ page }) => { const errors: string[] = []; page.on("console", (msg) => { if (msg.type() === "error") errors.push(msg.text()); }); await page.goto("/"); // wait-ok: asserting ABSENCE of errors over a time window — no positive // event to poll for; converting to a condition wait would gut the assertion. await page.waitForTimeout(POST_MOUNT_LOOP_SETTLE_MS); expect(errors.filter((e) => e.includes("Maximum update depth"))).toEqual([]); });

Ratcheting Down Wait Debt

Every waitForTimeout without a // wait-ok: <why> annotation is a debt item: it might be correct, but nobody can tell at a glance. The ratchet baseline turns that into a tracked, decreasing count rather than an invisible accumulation.

The check script

The script greps for unannotated waitForTimeout calls — those not preceded by a // wait-ok: comment within the two lines above — and compares the per-file count against a committed baseline:

#!/usr/bin/env bash
# scripts/check-wait-debt.sh
set -euo pipefail

BASELINE_FILE="e2e/wait-debt-baseline.txt"
SPEC_DIR="e2e"

# Nothing to check until the baseline has been introduced (existence guard).
[ -f "$BASELINE_FILE" ] || exit 0

# Count waitForTimeout calls that lack a // wait-ok: comment in the 2 lines above.
count_unannotated() {
  local path="$1" hits
  [ -f "$path" ] || { echo 0; return; }
  hits=$(grep -n "waitForTimeout" "$path" 2>/dev/null || true)
  [ -n "$hits" ] || { echo 0; return; }
  printf '%s\n' "$hits" | while IFS=":" read -r lineno _rest; do
    start=$(( lineno - 2 )); [ "$start" -lt 1 ] && start=1
    sed -n "${start},$((lineno - 1))p" "$path" | grep -q "wait-ok:" || echo found
  done | wc -l | tr -d ' '
}

# Expected count for a path: its baseline line, or 0 if absent (the implicit-zero rule).
expected_for() {
  awk -v p="$1" '$2 == p { print $1; found=1 } END { if (!found) print 0 }' "$BASELINE_FILE"
}

# Check EVERY spec file (so a file absent from the baseline is held to an implicit 0),
# unioned with the baseline's own paths (to catch a now-deleted file that still has an entry).
failed=0
checked=""
for path in $(find "$SPEC_DIR" -type f -name '*.spec.ts' 2>/dev/null) $(awk '{ print $2 }' "$BASELINE_FILE"); do
  case " $checked " in *" $path "*) continue ;; esac
  checked="$checked $path"
  expected=$(expected_for "$path")
  actual=$(count_unannotated "$path")
  if [ "$actual" -gt "$expected" ]; then
    echo "FAIL $path: $actual unannotated waits (baseline $expected) — annotate new waits with // wait-ok: <why>"
    failed=1
  elif [ "$actual" -lt "$expected" ]; then
    echo "FAIL $path: baseline is stale ($expected$actual) — shrink the baseline to $actual"
    failed=1
  fi
done

exit "$failed"

Baseline format

The baseline file records the per-file count of unannotated waits — not line numbers, so unrelated edits don't churn it:

2 e2e/editor.spec.ts
1 e2e/startup.spec.ts

Rules:

  • actual > baseline — new bare wait added; CI fails.

  • actual < baseline — baseline is stale; CI fails with "shrink the baseline to N". The baseline may only decrease, never increase without a matching annotation.

  • File absent from baseline — implicit count of 0; any unannotated wait fails immediately.

Wiring into pre-push and CI

# .github/workflows/e2e.yml (excerpt)
- name: Check wait debt
  run: bash scripts/check-wait-debt.sh
# scripts/run-b4push.sh (excerpt)
bash scripts/check-wait-debt.sh

The existence guard ([ -f "$BASELINE_FILE" ] || exit 0) means you can introduce the script before the baseline file exists — no breakage during rollout.

Known tradeoff

The add-one-remove-one case is invisible: if a single file gains one unannotated wait and loses another, the count stays the same and the ratchet does not catch it. This is acceptable for a debt ratchet — the goal is a monotonically shrinking total, not per-line enforcement. Pair with code review for the edge case.

Generalising to other debt classes

The same pattern applies to any greppable debt: any casts without a // any-ok: <why> comment, TODO comments without an issue reference, disabled lint rules without an expiry. Introduce one baseline file per debt class and wire them all into the same pre-push pass.

Console Error Monitoring

Extend Playwright's test fixture to automatically fail on console errors:

// e2e/fixtures.ts
import { test as base, expect } from "@playwright/test";

export const test = base.extend<{ consoleErrors: string[] }>({
  consoleErrors: async ({ page }, use) => {
    const errors: string[] = [];

    page.on("console", (msg) => {
      if (msg.type() === "error") {
        errors.push(msg.text());
      }
    });

    page.on("pageerror", (error) => {
      errors.push(error.message);
    });

    await use(errors);

    // Assert no console errors after each test
    expect(errors).toEqual([]);
  },
});

export { expect };
// e2e/app.spec.ts
import { test, expect } from "./fixtures";

const CONSOLE_SETTLE_MS = 1000;

test("home page has no console errors", async ({ page, consoleErrors }) => {
  await page.goto("/");
  await expect(page.locator("h1")).toBeVisible();

  // wait-ok: this test asserts the ABSENCE of console errors, so it must keep
  // observing past first paint — late console/pageerror events (a failed lazy
  // chunk, a post-hydration warning) fire after the heading is visible. There is
  // no positive event to poll for, so hold a bounded settle window before the
  // fixture teardown asserts. This is the documented absence-window exception.
  await page.waitForTimeout(CONSOLE_SETTLE_MS);
  // consoleErrors assertion happens automatically in fixture teardown
});

Tip

Replacing waitForLoadState("networkidle") with expect(...).toBeVisible() is the right move for asserting that a view is readynetworkidle is the canonical anti-pattern for SPA navigations that fire no network requests. But a console-error monitor asserts the absence of errors over a window, so it also needs the bounded wait-ok: settle above to catch errors that fire after first paint — a positive readiness assertion alone would end the test too early and green-light late errors. See Flake Root-Cause Catalog & Deflaking Recipe for the full catalog, including the absence-window exception.

Filtering benign errors with a curated allowlist

The expect(errors).toEqual([]) assertion above works on a pristine app — but real suites quickly hit a wall. There are almost always benign errors: framework dev warnings, third-party SDK noise, adapters that fail gracefully outside their real runtime. A strict empty-array assertion turns every one of those into a red test, and the usual reaction — loosening the check until it stops complaining — throws away the regression-catching value entirely.

The fix is an assertNoConsoleErrors() that filters a curated allowlist. The discipline that keeps it honest: every allowlist entry carries a why-comment justifying why that specific message is safe to ignore.

// e2e/helpers.ts
import { expect } from "@playwright/test";

export function assertNoConsoleErrors(errors: string[]) {
  const unexpected = errors.filter((msg) => {
    // React DevTools install nag — dev-only, not an app error.
    if (msg.includes("Download the React DevTools")) return false;
    // Favicon 404 — the mock server has no favicon; harmless.
    if (msg.includes("Failed to load resource") && msg.includes("favicon")) return false;
    // Tauri listen() fails in browser/mock mode: @tauri-apps/api's transformCallback
    // is undefined outside the WebView runtime. The error is caught internally and
    // the mock adapter registers its own in-memory listeners instead.
    if (msg.includes("Failed to register Tauri event listener")) return false;
    // React warns on an iframe rendered with src="" — known v1 limitation of the
    // preview pane when no URL is seeded; the iframe renders harmlessly.
    if (msg.includes('An empty string ("") was passed to the %s attribute') && msg.includes("src")) {
      return false;
    }
    return true;
  });
  expect(
    unexpected,
    `Unexpected console errors:\n${unexpected.join("\n")}`,
  ).toHaveLength(0);
}

Warning

The why-comment on each entry is the load-bearing part, not bureaucratic ceremony. Without a rationale, an allowlist silently rots into "ignore everything": months later nobody remembers whether an entry guards a real known-issue or was added to mute a genuine regression, so the safe move becomes never removing anything. A one-line why lets the next reader delete the entry the day its underlying cause is fixed — which is exactly when the allowlist should shrink, not grow.

CI Image Interception for Speed

In CI, network requests for large images slow down tests. Intercept and replace them with tiny placeholders:

// e2e/fixtures.ts
export const test = base.extend({
  page: async ({ page }, use) => {
    // Intercept image requests in CI
    if (process.env.CI) {
      await page.route("**/*.{png,jpg,jpeg,webp,gif}", (route) => {
        route.fulfill({
          status: 200,
          contentType: "image/png",
          // 1x1 transparent PNG
          body: Buffer.from(
            "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==",
            "base64"
          ),
        });
      });
    }
    await use(page);
  },
});

Note

This pattern from zmod cut CI E2E test time by 40% by eliminating network latency for image assets.

Production Build Verification

Test against the production build, not the dev server. This catches build-specific issues:

// playwright.config.ts
import { defineConfig } from "@playwright/test";

export default defineConfig({
  webServer: {
    command: "npm run build && npm run preview",
    port: 4173,
    reuseExistingServer: !process.env.CI,
  },
  use: {
    baseURL: "http://localhost:4173",
  },
});
// e2e/production.spec.ts
import { test, expect } from "@playwright/test";

test("production build serves all pages", async ({ page }) => {
  const urls = ["/", "/docs", "/about", "/contact"];
  for (const url of urls) {
    const response = await page.goto(url);
    expect(response?.status()).toBe(200);
  }
});

test("production build has no broken links", async ({ page }) => {
  await page.goto("/");
  const links = await page.locator("a[href^='/']").all();
  for (const link of links) {
    const href = await link.getAttribute("href");
    if (href) {
      const response = await page.goto(href);
      expect(response?.status()).toBe(200);
    }
  }
});

Note

When webServer is a list of N entries (one per fixture or app), every inner-loop run must build and boot all N servers — turning seconds into minutes. For the multi-fixture case, see the "Making T0 Real for Multi-Fixture E2E" guidance in Execution Tiers.

Sharded CI Runs

For large test suites, shard across multiple CI runners:

# .github/workflows/e2e.yml
jobs:
  e2e:
    strategy:
      matrix:
        shard: [1/4, 2/4, 3/4, 4/4]
    steps:
      - uses: actions/checkout@v4
      - run: npx playwright install --with-deps
      - run: npx playwright test --shard=${{ matrix.shard }}
      - uses: actions/upload-artifact@v4
        if: failure()
        with:
          name: playwright-report-${{ strategy.job-index }}
          path: playwright-report/

Mock Backend Adapter for Frontend-Only E2E

When testing frontend behavior independently from the real backend:

// e2e/mocks/backend-adapter.ts
import { Page } from "@playwright/test";

export async function mockBackend(page: Page) {
  await page.route("**/api/**", async (route) => {
    const url = new URL(route.request().url());

    const mocks: Record<string, unknown> = {
      "/api/user": { id: 1, name: "Test User", email: "test@example.com" },
      "/api/settings": { theme: "dark", language: "en" },
      "/api/documents": [
        { id: 1, title: "Doc 1" },
        { id: 2, title: "Doc 2" },
      ],
    };

    const mockData = mocks[url.pathname];
    if (mockData) {
      await route.fulfill({
        status: 200,
        contentType: "application/json",
        body: JSON.stringify(mockData),
      });
    } else {
      await route.continue();
    }
  });
}
// e2e/frontend.spec.ts
import { test, expect } from "@playwright/test";
import { mockBackend } from "./mocks/backend-adapter";

test.beforeEach(async ({ page }) => {
  await mockBackend(page);
});

test("displays user name from mock API", async ({ page }) => {
  await page.goto("/dashboard");
  await expect(page.locator(".user-name")).toHaveText("Test User");
});

Warning

Mock backends are great for frontend-focused testing, but they do not replace integration tests against the real API. Use both: mocked for UI behavior, real for data flow.

See Also

Running these patterns inside a sandboxed container (Claude Code on the web, locked-down WSL) where the Playwright CDN is blocked? See Browser Verification in Limited Environments for the seeing-eye fallback to a pre-installed Chromium, 127.0.0.1 dev-server binding, and the PR-preview-URL verification path.

Revision History

Takeshi TakatsudoCreated: 2026-04-04T07:11:52+09:00Updated: 2026-06-17T02:14:42+09:00