How to Capture Visual QA Evidence with AI Agents

Capture before-and-after screenshots as visual QA evidence with AI agents — review deploys, spot regressions, and attach artifacts to bug reports.

Last updated: 2026-06-28

Try ScreenshotAPI free

200 free screenshots/month. No credit card required.

Start for free

An AI agent can capture visual QA evidence by calling a hosted screenshot API on a deployed or preview URL, saving the rendered image as an artifact, and attaching it to a code review or bug report. This turns "looks fine to me" into a concrete before-and-after that reviewers and other agents can inspect. ScreenshotAPI renders pages with real Chrome in the cloud, so the agent gets the same pixels a user would see — no local browser, no headless setup.

Visual QA workflows agents can run

Visual evidence is most useful at the moments where a change ships and something visible might break:

Before a frontend change — capture the current production or preview URL to establish a baseline.
After deploy — capture the same URL on the new preview deployment and compare against the baseline.
Desktop and mobile — capture both breakpoints so responsive regressions show up in the review.
Attach to the review — save each image as a build artifact or PR comment so reviewers see the rendered result, not just the diff.
Failed-state evidence for bug reports — when a page renders wrong, capture it full-page and attach the image so the report is reproducible.

One hard boundary: the hosted API renders from outside your machine, so it captures deployed and preview URLs only. It cannot reach localhost or a private network. For local-only UI work, point the agent at a local browser tool (Puppeteer or Playwright) instead, and use the hosted API once there is a preview URL.

Before-and-after capture

Capture the baseline and the new version as two full-page PNGs. Pass the key as an x-api-key header (or Authorization: Bearer sk_live_...) and write the binary body straight to a file.


bash
# Baseline — current production
curl -G "https://screenshotapi.to/api/v1/screenshot" \
  -d "url=https://example.com/pricing" \
  -d "type=png" \
  -d "fullPage=true" \
  -H "x-api-key: sk_live_your_key_here" \
  --output before.png

# After — the preview deployment
curl -G "https://screenshotapi.to/api/v1/screenshot" \
  -d "url=https://pr-482.preview.example.com/pricing" \
  -d "type=png" \
  -d "fullPage=true" \
  -H "x-api-key: sk_live_your_key_here" \
  --output after.png

JavaScript helper function

A single helper the agent can call for each side of the comparison. It returns the screenshot ID and remaining credits so the agent can log them and self-limit.


javascript
import fs from 'node:fs/promises';

async function captureQA(url, outPath, { fullPage = true, width, height } = {}) {
  const params = new URLSearchParams({ url, type: 'png', fullPage: String(fullPage) });
  if (width) params.set('width', String(width));
  if (height) params.set('height', String(height));

  const res = await fetch(
    `https://screenshotapi.to/api/v1/screenshot?${params}`,
    { headers: { 'x-api-key': process.env.SCREENSHOT_API_KEY } }
  );

  if (!res.ok) {
    const { message, fix } = await res.json();
    throw new Error(`Capture failed: ${message}${fix ? ` — ${fix}` : ''}`);
  }

  await fs.writeFile(outPath, Buffer.from(await res.arrayBuffer()));
  return {
    id: res.headers.get('x-screenshot-id'),
    cache: res.headers.get('x-cache'),
    creditsRemaining: Number(res.headers.get('x-credits-remaining')),
  };
}

// Before and after, saved as artifacts
await captureQA('https://example.com/pricing', 'before.png');
const after = await captureQA('https://pr-482.preview.example.com/pricing', 'after.png');
console.log(`Saved after.png (${after.id}), ${after.creditsRemaining} credits left`);

Desktop and mobile viewports

Responsive bugs hide between breakpoints, so capture both. Desktop uses 1440x900; mobile uses a phone viewport with devicePixelRatio=2 for crisp, retina-accurate text.


bash
# Desktop
curl -G "https://screenshotapi.to/api/v1/screenshot" \
  -d "url=https://pr-482.preview.example.com" \
  -d "width=1440" \
  -d "height=900" \
  -d "fullPage=true" \
  -H "x-api-key: sk_live_your_key_here" \
  --output desktop.png

# Mobile
curl -G "https://screenshotapi.to/api/v1/screenshot" \
  -d "url=https://pr-482.preview.example.com" \
  -d "width=390" \
  -d "height=844" \
  -d "devicePixelRatio=2" \
  -d "fullPage=true" \
  -H "x-api-key: sk_live_your_key_here" \
  --output mobile.png

Python example

The same desktop-plus-mobile sweep in Python, writing one artifact per viewport.


python
import requests

API = "https://screenshotapi.to/api/v1/screenshot"
HEADERS = {"x-api-key": "sk_live_your_key_here"}
URL = "https://pr-482.preview.example.com"

viewports = {
    "desktop.png": {"width": 1440, "height": 900},
    "mobile.png": {"width": 390, "height": 844, "devicePixelRatio": 2},
}

for out, vp in viewports.items():
    res = requests.get(API, params={"url": URL, "type": "png", "fullPage": "true", **vp}, headers=HEADERS)
    res.raise_for_status()
    with open(out, "wb") as f:
        f.write(res.content)
    print(out, res.headers.get("x-cache"), res.headers.get("x-credits-remaining"))

Agent checklist

Drop this directly into a task prompt or system rule so the agent runs a consistent QA pass.


text
Goal: capture visual QA evidence for the preview deployment.

1. Capture the preview URL at desktop (width=1440, height=900) and mobile (width=390, height=844, devicePixelRatio=2).
2. Always use fullPage=true so the whole page is in frame.
3. Save each capture as a named artifact (e.g. desktop.png, mobile.png) and attach them to the review.
4. Inspect and report: visible differences vs. the baseline, clipping or content cut off, layout overflow, broken or missing images, and unreadable or overlapping text.
5. If a capture returns a 500, read the "fix" field and retry once (e.g. waitUntil=networkidle0 or stealthMode=true).
6. Do NOT capture unauthorized, private-network, or localhost URLs with the hosted API. For local-only UI, use a local browser tool instead.

Recommended tool schema

Expose capture as a single tool so a framework agent can call it deterministically. This shape works with OpenAI tools, the Anthropic Messages API tool format, and most agent frameworks.


json
{
  "name": "capture_qa_screenshot",
  "description": "Capture a full-page screenshot of a public or preview URL as visual QA evidence. Use only for deployed/preview URLs, never localhost or private networks.",
  "parameters": {
    "type": "object",
    "properties": {
      "url": {
        "type": "string",
        "description": "Public or preview URL to capture (https://...)."
      },
      "viewport": {
        "type": "string",
        "enum": ["desktop", "mobile"],
        "description": "desktop = 1440x900; mobile = 390x844 @2x.",
        "default": "desktop"
      },
      "fullPage": {
        "type": "boolean",
        "description": "Capture the entire scrollable page.",
        "default": true
      }
    },
    "required": ["url"]
  }
}

For machine-readable contracts, point the agent at /openapi.json for the full schema and /llms-full.txt for LLM-ready documentation of every parameter.

Wire it into CI

Capture the preview URL after it deploys and upload the image as a build artifact, so every PR carries rendered evidence. This GitHub Actions snippet assumes a prior step exported the deployed preview URL.


yaml
name: Visual QA evidence
on: pull_request

jobs:
  capture:
    runs-on: ubuntu-latest
    steps:
      - name: Capture preview screenshots
        env:
          SCREENSHOT_API_KEY: ${{ secrets.SCREENSHOT_API_KEY }}
          PREVIEW_URL: ${{ steps.deploy.outputs.preview-url }}
        run: |
          curl -G "https://screenshotapi.to/api/v1/screenshot" \
            -d "url=${PREVIEW_URL}" -d "fullPage=true" -d "width=1440" -d "height=900" \
            -H "x-api-key: ${SCREENSHOT_API_KEY}" --fail --output desktop.png
          curl -G "https://screenshotapi.to/api/v1/screenshot" \
            -d "url=${PREVIEW_URL}" -d "fullPage=true" -d "width=390" -d "height=844" -d "devicePixelRatio=2" \
            -H "x-api-key: ${SCREENSHOT_API_KEY}" --fail --output mobile.png

      - name: Upload artifacts
        uses: actions/upload-artifact@v4
        with:
          name: visual-qa
          path: |
            desktop.png
            mobile.png

To turn this into an automated pass/fail gate that diffs against a stored baseline, see how to build a visual regression testing pipeline.

Reading the response (agent DX)

The response body is the raw image; everything an agent needs to verify and self-correct is in the headers and error envelope:

x-credits-remaining — remaining quota; let the agent self-limit before exhausting it.
x-cache — HIT or MISS. Cached responses don't count against quota, so re-capturing an unchanged baseline is effectively free.
x-screenshot-id — stable ID to log alongside the artifact for traceability.
x-duration-ms — render time, useful for spotting slow pages.

On failure the body is JSON. A 400 (bad request) or 402 (out of credits) returns { error, message }. A 500 adds a fix field with an actionable hint:


json
{
  "error": "render_failed",
  "message": "Navigation timed out before the page settled.",
  "fix": "Retry with waitUntil=networkidle0, or add stealthMode=true if the site blocks bots."
}

That fix hint is the self-healing path: the agent reads it, adjusts one parameter, and retries once instead of giving up. Other handy reliability params are waitForSelector (wait for a specific element), delay (0–30000 ms), blockAds, and removeCookieBanners to keep overlays out of the frame.

What screenshots can and cannot prove

Visual evidence is decisive for anything you can see, and silent about anything you have to interact with. Scope the agent's claims accordingly:

Can prove (visual)	Cannot prove (functional)
Rendered layout and spacing	A form actually submits
Typography and web-font loading	A click handler or button works
Image loading (or broken images)	An authenticated flow completes
Content overflow and clipping	Client-side state transitions
Color scheme (light/dark)	Network requests succeed
Visible copy and headings	Redirects and routing logic

Pair visual evidence with automated functional tests and browser-based QA whenever the feature is interactive — screenshots confirm it looks right, tests confirm it works.

Use it from your stack & related

MCP — connect a Model Context Protocol client at /mcp to expose capture as a native tool.
OpenAPI — import /openapi.json to generate a typed client in any language.
Agent frameworks — wire the tool schema above into your framework of choice.
Setup guide — see AI-agent integrations for end-to-end examples.

Frequently asked questions

How can an AI agent capture visual QA evidence?

Give the agent an API key and a public or preview URL, then have it call the ScreenshotAPI GET endpoint with fullPage=true to save a before-and-after screenshot. The agent attaches both images as artifacts to a review or bug report and describes the visible differences it observes.

What can before-and-after screenshots actually prove?

Screenshots prove rendered layout, typography, web-font loading, image loading, content overflow or clipping, color scheme, and visible copy. They do not prove that a form submits, a click handler fires, or an authenticated flow completes — pair them with functional tests.

Can ScreenshotAPI capture localhost or a private dev server?

No. The hosted API renders from outside your machine, so it can only reach public or authorized URLs such as deployed and preview deployments. For local-only UI, use a local browser tool like Puppeteer or Playwright instead.

Should visual QA evidence replace browser-based QA or tests?

No. Visual evidence is a fast complement to functional and visual-regression tests, not a replacement. Use screenshots to catch rendering regressions a unit test would miss, and keep functional tests for interactivity, forms, and auth.

How does an agent capture both desktop and mobile views?

Call the endpoint twice with different viewports: width=1440 and height=900 for desktop, and width=390, height=844, devicePixelRatio=2 for mobile. Save each result as a separate artifact so the review shows both breakpoints.

How does an agent know a capture succeeded or self-heal a failure?

Read the response headers: x-credits-remaining, x-cache (HIT or MISS), x-screenshot-id, and x-duration-ms. On a 500 the JSON body includes a fix field with an actionable hint, such as retrying with waitUntil=networkidle0 or stealthMode=true.

Start capturing screenshots today

Create a free account and get 200 free screenshots per month to try the API. No credit card required.

Create free account View pricing