URL to Image API for AI Agents

Use a URL to image API for AI agents: capture public or authorized pages as PNG, JPEG, or WebP with one HTTP call, no headless browser to run.

Last updated: 2026-06-28

Try ScreenshotAPI free

200 free screenshots/month. No credit card required.

Start for free

An AI agent captures a URL as an image by sending one authenticated GET request to a URL to image API, passing the target page as the url parameter and saving the binary PNG, JPEG, or WebP that comes back. The agent never installs Chromium or runs a browser pool: ScreenshotAPI renders the page in real Chrome in the cloud and returns the image plus metadata headers the agent can act on. This guide shows the exact request, a copy-paste tool schema, and the guardrails that keep an autonomous loop safe.

When to use URL-to-image capture

Reach for a URL to image API from an agent when the task is a direct capture of a public or authorized page:

  • Before-and-after UI screenshots for code review — capture a preview deploy and the production URL so a reviewer (or another agent) can diff them visually.
  • Link previews and directory thumbnails — turn user-submitted URLs into preview cards.
  • Dashboard and status-page evidence — snapshot a monitoring view when the URL is already authorized.
  • Marketing-page snapshots for reports — pull a competitor or landing page into a research summary.
  • Web archives — store a rendered image when a picture is more useful than scraped text.

Do not use URL-to-image capture when the task requires clicking through a flow, filling forms, logging in, or bypassing access controls. Those interactive jobs need full browser automation (see the comparison below). If the agent generates its own markup rather than pointing at a live URL, use the HTML to image API instead.

Agent-ready request

The simplest tool call is a single authenticated HTTP GET. Pass the url in the query string and stream the binary response to a file. Lead with GET because it is the lightest path and HTML rendering (which would require POST) is not needed for capturing a live URL.

bash
curl -G "https://screenshotapi.to/api/v1/screenshot" \ -d "url=https://example.com" \ -d "type=png" \ -d "fullPage=true" \ -d "blockAds=true" \ -d "removeCookieBanners=true" \ -H "x-api-key: sk_live_your_key_here" \ --output screenshot.png

Auth uses the x-api-key header (preferred). Authorization: Bearer sk_live_... also works if your framework only exposes a bearer field. Get a key from the dashboard; the free tier includes 200 screenshots per month and cached responses do not count against it.

JavaScript example

javascript
const params = new URLSearchParams({ url: 'https://example.com', type: 'png', fullPage: 'true', blockAds: 'true', removeCookieBanners: 'true', cacheTtl: '3600' }); const response = await fetch( `https://screenshotapi.to/api/v1/screenshot?${params}`, { headers: { 'x-api-key': process.env.SCREENSHOTAPI_KEY } } ); if (!response.ok) { const { message, fix } = await response.json(); throw new Error(`${message}${fix ? ` (fix: ${fix})` : ''}`); } const buffer = Buffer.from(await response.arrayBuffer()); await fs.promises.writeFile('screenshot.png', buffer);

Python example

python
import requests response = requests.get( "https://screenshotapi.to/api/v1/screenshot", params={ "url": "https://example.com", "type": "png", "fullPage": True, "blockAds": True, "removeCookieBanners": True, "cacheTtl": 3600, }, headers={"x-api-key": "sk_live_your_key_here"}, timeout=60, ) if not response.ok: body = response.json() raise RuntimeError(f"{body['message']} (fix: {body.get('fix', 'n/a')})") with open("screenshot.png", "wb") as f: f.write(response.content)

Recommended tool schema

When you expose ScreenshotAPI to an agent framework, keep the tool narrow so the model picks sensible parameters. This JSON definition works as a starting point for LangChain tools, the Vercel AI SDK, the OpenAI Agents SDK, or any framework that accepts a JSON schema:

json
{ "name": "capture_url_image", "description": "Capture a public or authorized URL as a PNG, JPEG, or WebP image. Returns a file path to the saved image. Do not use for localhost, private networks, or pages requiring login.", "parameters": { "type": "object", "properties": { "url": { "type": "string", "description": "Absolute public or authorized URL to capture" }, "type": { "type": "string", "enum": ["png", "jpeg", "webp"], "description": "Output image format (default png)" }, "fullPage": { "type": "boolean", "description": "Capture the full scrollable page instead of the viewport" }, "waitUntil": { "type": "string", "enum": ["load", "domcontentloaded", "networkidle0", "networkidle2"], "description": "When to consider the page loaded before capture" }, "waitForSelector": { "type": "string", "description": "Optional CSS selector that must appear before capture" } }, "required": ["url"] } }

You usually do not have to write this by hand. Point GPT Actions and other OpenAPI-compatible tool builders at /openapi.json and they will generate the full action set. Point coding assistants (Cursor, Claude, Copilot) at /llms-full.txt for the complete agent-readable documentation corpus.

Reading the response (agent DX)

A successful response is the raw image bytes. The useful signal for an agent lives in the response headers and the JSON error body, so the loop should always inspect them rather than blindly trusting a 200.

Key response headers:

  • x-credits-remaining — credits left this period; use it to self-limit and stop before you hit the quota.
  • x-cacheHIT or MISS; a HIT was served from cache and did not consume quota.
  • x-screenshot-id — stable identifier for the capture; log it for traceability.
  • x-duration-ms — how long the render took; useful for tuning waitUntil.
  • x-usage-source — which key or plan the request was billed against.

Errors are JSON, not images. A 400 ({error, message}) means the request was invalid, a 402 ({error, message}) means you are out of quota, and a 500 ({error, message, fix}) means the capture itself failed — fix carries a concrete remediation hint like try waitUntil=networkidle0 or stealthMode=true. Branch on response.ok and read the headers:

javascript
const response = await fetch(url, { headers: { 'x-api-key': key } }); if (!response.ok) { const { message, fix } = await response.json(); // Self-heal: apply the suggested parameter and retry once. if (fix) return retryWithFix(fix); throw new Error(message); } const remaining = Number(response.headers.get('x-credits-remaining')); const cached = response.headers.get('x-cache') === 'HIT'; const id = response.headers.get('x-screenshot-id'); if (remaining < 10) pauseCaptureLoop(); // back off before hitting quota const buffer = Buffer.from(await response.arrayBuffer()); await fs.promises.writeFile(`captures/${id}.png`, buffer);

Safe defaults for autonomous loops

Autonomous agents need explicit boundaries so a planning mistake cannot drain credits or hit the wrong host:

  • Cap captures per task. Set a hard maximum and stop when it is reached.
  • Watch x-credits-remaining. Pause the loop before the quota runs out instead of waiting for a 402.
  • Set cacheTtl (1–604800 s) for repeated captures of the same URL — cache hits are free and fast.
  • Use waitForSelector when the expected page state is known; it is more reliable than a fixed delay.
  • Save artifacts, not binaries. Store file paths or artifact URLs; never paste large base64 image payloads back into the prompt.
  • Validate the target. Reject localhost, private ranges (10.x, 192.168.x, 172.16–31.x), cloud metadata hosts (169.254.169.254), and any URL the agent is not authorized to capture, before the API call.

ScreenshotAPI vs browser automation

Both approaches render a page in Chrome, but they solve different problems for an agent. A hosted URL to image API removes infrastructure; Playwright and Puppeteer give you a scriptable browser when you need interaction.

CapabilityScreenshotAPIPlaywright / Puppeteer
Hosted infrastructureYes — cloud Chrome, nothing to runNo — you run and scale the browser
Setup for an agentAPI key + one HTTP callInstall Chromium, manage a browser pool
Clicking, login, form fillNo (use a browser tool)Yes
Full-page captureYes (fullPage=true)Yes (with scripting)
Ad / cookie-banner removalBuilt-in flagsManual selectors
Concurrency & scalingHandled by the APIYour responsibility
Best forFast captures of public/authorized URLsInteractive, multi-step browser flows

A common pattern is to use both: ScreenshotAPI for quick, stateless captures and a browser tool only when the task genuinely needs to interact with the page.

Use it from your stack

ScreenshotAPI plugs into agent stacks without custom glue:

  • MCP server — add screenshot capture to any MCP-compatible client (Claude, Cursor, and more) via the MCP server.
  • GPT Actions / OpenAPI — point any OpenAPI tool builder at /openapi.json to generate the action.
  • Frameworks — wrap the request in a tool for LangChain, the Vercel AI SDK, or the OpenAI Agents SDK using the schema above.
  • Coding assistants — feed /llms-full.txt so the assistant writes correct calls.

Full setup details and per-framework snippets are in the AI agents guide.

Related guides / Next steps

Frequently asked questions

How does an AI agent capture a URL as an image?

The agent sends an authenticated GET request to a URL to image API with the target URL as a query parameter, then saves the binary PNG, JPEG, or WebP that comes back. No headless browser is installed or managed by the agent. ScreenshotAPI renders the page in real Chrome in the cloud and returns the image plus metadata headers the agent can read.

Can an AI agent capture any URL as an image?

No. Agents should capture only public or authorized URLs and must never target localhost, private networks (10.x, 192.168.x, 172.16-31.x), cloud metadata endpoints, paywalled content, or pages where they lack permission. Validate and allowlist the target host before making the call. This protects against SSRF and accidental capture of internal systems.

Should agents use a URL to image API instead of Playwright or Puppeteer?

Use a URL to image API when the agent needs a hosted capture of a public or authorized page without running browser infrastructure. Use Playwright or Puppeteer when the workflow requires clicking, logging in, submitting forms, or multi-step interaction. Many agents combine both: the API for fast static captures and a browser tool for interactive flows.

What image formats can an agent request from the URL to image API?

ScreenshotAPI returns PNG (default), JPEG, or WebP via the type parameter. Use WebP for the smallest artifacts when storing many captures, JPEG with a quality value (1-100) when file size matters more than transparency, and PNG when you need lossless output or an alpha channel. The same endpoint also returns PDF when type=pdf.

How does an agent avoid burning through its quota?

Read the x-credits-remaining response header after each call and stop or back off when it drops below a threshold. Set a cacheTtl so repeat captures of the same URL are served from cache, and cached responses do not count against your quota. Also cap the number of captures per task so a runaway loop cannot drain credits.

What does an agent do when a capture fails?

Check response.ok first. On a 5xx the body is JSON shaped like {error, message, fix}, where fix is a concrete remediation hint such as try waitUntil=networkidle0 or stealthMode=true. The agent can apply the suggested parameter and retry once, which makes the loop self-healing without human intervention.

Related resources

Start capturing screenshots today

Create a free account and get 200 free screenshots per month to try the API. No credit card required.