URL to Image API for AI Agents
Use a URL to image API for AI agents: capture public or authorized pages as PNG, JPEG, or WebP with one HTTP call, no headless browser to run.
Last updated: 2026-06-28
Try ScreenshotAPI free
200 free screenshots/month. No credit card required.
An AI agent captures a URL as an image by sending one authenticated GET request to a URL to image API, passing the target page as the url parameter and saving the binary PNG, JPEG, or WebP that comes back. The agent never installs Chromium or runs a browser pool: ScreenshotAPI renders the page in real Chrome in the cloud and returns the image plus metadata headers the agent can act on. This guide shows the exact request, a copy-paste tool schema, and the guardrails that keep an autonomous loop safe.
When to use URL-to-image capture
Reach for a URL to image API from an agent when the task is a direct capture of a public or authorized page:
- Before-and-after UI screenshots for code review — capture a preview deploy and the production URL so a reviewer (or another agent) can diff them visually.
- Link previews and directory thumbnails — turn user-submitted URLs into preview cards.
- Dashboard and status-page evidence — snapshot a monitoring view when the URL is already authorized.
- Marketing-page snapshots for reports — pull a competitor or landing page into a research summary.
- Web archives — store a rendered image when a picture is more useful than scraped text.
Do not use URL-to-image capture when the task requires clicking through a flow, filling forms, logging in, or bypassing access controls. Those interactive jobs need full browser automation (see the comparison below). If the agent generates its own markup rather than pointing at a live URL, use the HTML to image API instead.
Agent-ready request
The simplest tool call is a single authenticated HTTP GET. Pass the url in the query string and stream the binary response to a file. Lead with GET because it is the lightest path and HTML rendering (which would require POST) is not needed for capturing a live URL.
bashcurl -G "https://screenshotapi.to/api/v1/screenshot" \ -d "url=https://example.com" \ -d "type=png" \ -d "fullPage=true" \ -d "blockAds=true" \ -d "removeCookieBanners=true" \ -H "x-api-key: sk_live_your_key_here" \ --output screenshot.png
Auth uses the x-api-key header (preferred). Authorization: Bearer sk_live_... also works if your framework only exposes a bearer field. Get a key from the dashboard; the free tier includes 200 screenshots per month and cached responses do not count against it.
JavaScript example
javascriptconst params = new URLSearchParams({ url: 'https://example.com', type: 'png', fullPage: 'true', blockAds: 'true', removeCookieBanners: 'true', cacheTtl: '3600' }); const response = await fetch( `https://screenshotapi.to/api/v1/screenshot?${params}`, { headers: { 'x-api-key': process.env.SCREENSHOTAPI_KEY } } ); if (!response.ok) { const { message, fix } = await response.json(); throw new Error(`${message}${fix ? ` (fix: ${fix})` : ''}`); } const buffer = Buffer.from(await response.arrayBuffer()); await fs.promises.writeFile('screenshot.png', buffer);
Python example
pythonimport requests response = requests.get( "https://screenshotapi.to/api/v1/screenshot", params={ "url": "https://example.com", "type": "png", "fullPage": True, "blockAds": True, "removeCookieBanners": True, "cacheTtl": 3600, }, headers={"x-api-key": "sk_live_your_key_here"}, timeout=60, ) if not response.ok: body = response.json() raise RuntimeError(f"{body['message']} (fix: {body.get('fix', 'n/a')})") with open("screenshot.png", "wb") as f: f.write(response.content)
Recommended tool schema
When you expose ScreenshotAPI to an agent framework, keep the tool narrow so the model picks sensible parameters. This JSON definition works as a starting point for LangChain tools, the Vercel AI SDK, the OpenAI Agents SDK, or any framework that accepts a JSON schema:
json{ "name": "capture_url_image", "description": "Capture a public or authorized URL as a PNG, JPEG, or WebP image. Returns a file path to the saved image. Do not use for localhost, private networks, or pages requiring login.", "parameters": { "type": "object", "properties": { "url": { "type": "string", "description": "Absolute public or authorized URL to capture" }, "type": { "type": "string", "enum": ["png", "jpeg", "webp"], "description": "Output image format (default png)" }, "fullPage": { "type": "boolean", "description": "Capture the full scrollable page instead of the viewport" }, "waitUntil": { "type": "string", "enum": ["load", "domcontentloaded", "networkidle0", "networkidle2"], "description": "When to consider the page loaded before capture" }, "waitForSelector": { "type": "string", "description": "Optional CSS selector that must appear before capture" } }, "required": ["url"] } }
You usually do not have to write this by hand. Point GPT Actions and other OpenAPI-compatible tool builders at /openapi.json and they will generate the full action set. Point coding assistants (Cursor, Claude, Copilot) at /llms-full.txt for the complete agent-readable documentation corpus.
Reading the response (agent DX)
A successful response is the raw image bytes. The useful signal for an agent lives in the response headers and the JSON error body, so the loop should always inspect them rather than blindly trusting a 200.
Key response headers:
x-credits-remaining— credits left this period; use it to self-limit and stop before you hit the quota.x-cache—HITorMISS; aHITwas served from cache and did not consume quota.x-screenshot-id— stable identifier for the capture; log it for traceability.x-duration-ms— how long the render took; useful for tuningwaitUntil.x-usage-source— which key or plan the request was billed against.
Errors are JSON, not images. A 400 ({error, message}) means the request was invalid, a 402 ({error, message}) means you are out of quota, and a 500 ({error, message, fix}) means the capture itself failed — fix carries a concrete remediation hint like try waitUntil=networkidle0 or stealthMode=true. Branch on response.ok and read the headers:
javascriptconst response = await fetch(url, { headers: { 'x-api-key': key } }); if (!response.ok) { const { message, fix } = await response.json(); // Self-heal: apply the suggested parameter and retry once. if (fix) return retryWithFix(fix); throw new Error(message); } const remaining = Number(response.headers.get('x-credits-remaining')); const cached = response.headers.get('x-cache') === 'HIT'; const id = response.headers.get('x-screenshot-id'); if (remaining < 10) pauseCaptureLoop(); // back off before hitting quota const buffer = Buffer.from(await response.arrayBuffer()); await fs.promises.writeFile(`captures/${id}.png`, buffer);
Safe defaults for autonomous loops
Autonomous agents need explicit boundaries so a planning mistake cannot drain credits or hit the wrong host:
- Cap captures per task. Set a hard maximum and stop when it is reached.
- Watch
x-credits-remaining. Pause the loop before the quota runs out instead of waiting for a402. - Set
cacheTtl(1–604800 s) for repeated captures of the same URL — cache hits are free and fast. - Use
waitForSelectorwhen the expected page state is known; it is more reliable than a fixeddelay. - Save artifacts, not binaries. Store file paths or artifact URLs; never paste large base64 image payloads back into the prompt.
- Validate the target. Reject
localhost, private ranges (10.x,192.168.x,172.16–31.x), cloud metadata hosts (169.254.169.254), and any URL the agent is not authorized to capture, before the API call.
ScreenshotAPI vs browser automation
Both approaches render a page in Chrome, but they solve different problems for an agent. A hosted URL to image API removes infrastructure; Playwright and Puppeteer give you a scriptable browser when you need interaction.
| Capability | ScreenshotAPI | Playwright / Puppeteer |
|---|---|---|
| Hosted infrastructure | Yes — cloud Chrome, nothing to run | No — you run and scale the browser |
| Setup for an agent | API key + one HTTP call | Install Chromium, manage a browser pool |
| Clicking, login, form fill | No (use a browser tool) | Yes |
| Full-page capture | Yes (fullPage=true) | Yes (with scripting) |
| Ad / cookie-banner removal | Built-in flags | Manual selectors |
| Concurrency & scaling | Handled by the API | Your responsibility |
| Best for | Fast captures of public/authorized URLs | Interactive, multi-step browser flows |
A common pattern is to use both: ScreenshotAPI for quick, stateless captures and a browser tool only when the task genuinely needs to interact with the page.
Use it from your stack
ScreenshotAPI plugs into agent stacks without custom glue:
- MCP server — add screenshot capture to any MCP-compatible client (Claude, Cursor, and more) via the MCP server.
- GPT Actions / OpenAPI — point any OpenAPI tool builder at
/openapi.jsonto generate the action. - Frameworks — wrap the request in a tool for LangChain, the Vercel AI SDK, or the OpenAI Agents SDK using the schema above.
- Coding assistants — feed
/llms-full.txtso the assistant writes correct calls.
Full setup details and per-framework snippets are in the AI agents guide.
Related guides / Next steps
- HTML to Image API for AI Agents — render agent-generated markup to an image.
- URL to PDF API for AI Agents — capture pages as multi-page PDFs instead.
- Capture Visual QA Evidence with AI Agents — wire captures into review and QA loops.
- AI Agent Integrations — connect ScreenshotAPI to ChatGPT, Claude, Cursor, and more.
- Screenshot API Reference — every parameter, header, and error state.
Frequently asked questions
How does an AI agent capture a URL as an image?
The agent sends an authenticated GET request to a URL to image API with the target URL as a query parameter, then saves the binary PNG, JPEG, or WebP that comes back. No headless browser is installed or managed by the agent. ScreenshotAPI renders the page in real Chrome in the cloud and returns the image plus metadata headers the agent can read.
Can an AI agent capture any URL as an image?
No. Agents should capture only public or authorized URLs and must never target localhost, private networks (10.x, 192.168.x, 172.16-31.x), cloud metadata endpoints, paywalled content, or pages where they lack permission. Validate and allowlist the target host before making the call. This protects against SSRF and accidental capture of internal systems.
Should agents use a URL to image API instead of Playwright or Puppeteer?
Use a URL to image API when the agent needs a hosted capture of a public or authorized page without running browser infrastructure. Use Playwright or Puppeteer when the workflow requires clicking, logging in, submitting forms, or multi-step interaction. Many agents combine both: the API for fast static captures and a browser tool for interactive flows.
What image formats can an agent request from the URL to image API?
ScreenshotAPI returns PNG (default), JPEG, or WebP via the type parameter. Use WebP for the smallest artifacts when storing many captures, JPEG with a quality value (1-100) when file size matters more than transparency, and PNG when you need lossless output or an alpha channel. The same endpoint also returns PDF when type=pdf.
How does an agent avoid burning through its quota?
Read the x-credits-remaining response header after each call and stop or back off when it drops below a threshold. Set a cacheTtl so repeat captures of the same URL are served from cache, and cached responses do not count against your quota. Also cap the number of captures per task so a runaway loop cannot drain credits.
What does an agent do when a capture fails?
Check response.ok first. On a 5xx the body is JSON shaped like {error, message, fix}, where fix is a concrete remediation hint such as try waitUntil=networkidle0 or stealthMode=true. The agent can apply the suggested parameter and retry once, which makes the loop self-healing without human intervention.
Related resources
AI Agent Integrations
Use ScreenshotAPI from ChatGPT, Claude, Cursor, Codex, VS Code, and agent frameworks.
Screenshot API Reference
Request parameters, authentication, responses, and error states.
HTML to Image API for AI Agents
Render agent-generated HTML to an image with a single POST request.
MCP Server
Add screenshot capture to any MCP-compatible agent in minutes.
Start capturing screenshots today
Create a free account and get 200 free screenshots per month to try the API. No credit card required.