HTML to Image API for AI Agents
Use an HTML to image API for AI agents to render generated HTML into PNG, JPEG, or WebP without running a browser.
Last updated: 2026-06-28
Try ScreenshotAPI free
200 free screenshots/month. No credit card required.
An AI agent can turn its own generated HTML into a shareable image by sending the markup to an HTML to image API for AI agents, which renders it in a real Chrome browser in the cloud and returns a PNG, JPEG, or WebP. The agent never installs Chromium, manages a browser pool, or runs any rendering infrastructure: it builds self-contained HTML, sends one authenticated POST, and saves the binary response as an artifact. This guide shows the exact request, an agent prompt pattern, a tool schema, and the rules that keep rendering deterministic and safe.
When to render HTML from an agent
Render HTML when the agent already holds the content and there is no live page to visit. Capturing a live URL is a different job covered in the URL to image API for AI agents guide — use that when the visual already exists at an address.
Good fits for HTML rendering from an agent:
- Open Graph and social cards generated from a title, summary, or campaign copy.
- Email preview images so a human can approve a draft before it sends.
- Report and summary snapshots posted to Slack, email, or issue comments.
- Invoices, receipts, and certificates built from structured fields.
- Lightweight dashboards and status boards rendered from data the agent already has.
In every case the agent owns the markup, so it can make the HTML deterministic: inline CSS, a fixed width and height, no external scripts, and bounded content length.
Raw HTML request
HTML rendering requires POST with a JSON body containing an html field. A GET request with html is rejected with Use POST /api/v1/screenshot for html rendering. On POST you provide either url or html. The response body is the binary image.
The example below renders a 1200x630 Open Graph card with inline CSS — the canonical agent output shape.
bashcurl "https://screenshotapi.to/api/v1/screenshot" \ -X POST \ -H "content-type: application/json" \ -H "x-api-key: sk_live_your_key_here" \ --data '{ "html": "<!doctype html><html><body style=\"margin:0;width:1200px;height:630px;display:flex;flex-direction:column;justify-content:center;padding:80px;box-sizing:border-box;font-family:-apple-system,Segoe UI,Roboto,sans-serif;background:linear-gradient(135deg,#0f172a,#1e3a8a);color:#fff\"><div style=\"font-size:20px;letter-spacing:2px;opacity:.7\">WEEKLY REPORT</div><h1 style=\"font-size:64px;line-height:1.1;margin:16px 0 0\">All 42 checks passed</h1><p style=\"font-size:28px;opacity:.85;margin-top:24px\">Generated automatically by the release agent</p></body></html>", "type": "png", "width": 1200, "height": 630 }' \ --output og-card.png
JavaScript example
For agents running on Node or Bun, send the same body with fetch. Authenticate with the x-api-key header (preferred) or Authorization: Bearer sk_live_....
javascriptconst html = `<!doctype html><html><body style="margin:0;width:1200px;height:630px;display:flex;flex-direction:column;justify-content:center;padding:80px;box-sizing:border-box;font-family:-apple-system,Segoe UI,Roboto,sans-serif;background:linear-gradient(135deg,#0f172a,#1e3a8a);color:#fff"><div style="font-size:20px;letter-spacing:2px;opacity:.7">WEEKLY REPORT</div><h1 style="font-size:64px;line-height:1.1;margin:16px 0 0">All 42 checks passed</h1><p style="font-size:28px;opacity:.85;margin-top:24px">Generated automatically by the release agent</p></body></html>`; const response = await fetch('https://screenshotapi.to/api/v1/screenshot', { method: 'POST', headers: { 'content-type': 'application/json', 'x-api-key': process.env.SCREENSHOTAPI_KEY }, body: JSON.stringify({ html, type: 'png', width: 1200, height: 630 }) }); if (!response.ok) { const { error, message, fix } = await response.json(); throw new Error(`${error}: ${message}${fix ? ` (fix: ${fix})` : ''}`); } await Bun.write('og-card.png', await response.arrayBuffer());
Python example
The requests library serializes the body when you pass json=, which also sets the content-type header.
pythonimport requests html = """<!doctype html><html><body style="margin:0;width:1200px;height:630px;display:flex;flex-direction:column;justify-content:center;padding:80px;box-sizing:border-box;font-family:-apple-system,Segoe UI,Roboto,sans-serif;background:linear-gradient(135deg,#0f172a,#1e3a8a);color:#fff"><div style="font-size:20px;letter-spacing:2px;opacity:.7">WEEKLY REPORT</div><h1 style="font-size:64px;line-height:1.1;margin:16px 0 0">All 42 checks passed</h1><p style="font-size:28px;opacity:.85;margin-top:24px">Generated automatically by the release agent</p></body></html>""" response = requests.post( "https://screenshotapi.to/api/v1/screenshot", headers={"x-api-key": "sk_live_your_key_here"}, json={"html": html, "type": "png", "width": 1200, "height": 630}, ) response.raise_for_status() with open("og-card.png", "wb") as f: f.write(response.content)
Agent prompt pattern
When you delegate the HTML generation to a model, constrain the output so it renders predictably. Drop this into the system or task instructions:
textYou are rendering a 1200x630 image from HTML. Output requirements: - Return ONE self-contained HTML document. No external stylesheets, fonts, or scripts. - Use inline CSS only. Set the root element to width:1200px; height:630px; box-sizing:border-box. - Keep content bounded: one headline, one short supporting line, no long paragraphs. - Do NOT include API keys, tokens, passwords, customer PII, or any private data. Then call ScreenshotAPI: - POST https://screenshotapi.to/api/v1/screenshot - JSON body: { "html": <your HTML>, "type": "png", "width": 1200, "height": 630 } - Header: x-api-key: <key> Save the binary response as the task artifact. Do not paste image bytes back into the conversation.
Recommended tool schema
Expose a narrow tool to your agent framework so the model only controls what it should. The endpoint is the same for all callers.
json{ "name": "render_html_image", "description": "Render a self-contained HTML document to a PNG, JPEG, or WebP image. Use for agent-generated content (cards, reports, invoices) when there is no live URL to capture.", "parameters": { "html": "Self-contained HTML document with inline CSS and no external scripts", "type": "png, jpeg, or webp", "width": "Viewport width in pixels, 1-1920 (e.g. 1200)", "height": "Viewport height in pixels, 1-10000 (e.g. 630)", "quality": "Compression quality 1-100, jpeg and webp only", "preloadFonts": "Set true only when the design depends on web fonts" } }
Point GPT Actions and other OpenAPI-compatible tool builders at /openapi.json. Point coding assistants at /llms-full.txt for the full agent-readable documentation corpus.
Reading the response (agent DX)
A successful render returns the raw image bytes plus headers an agent can use to stay within limits and to log results:
x-credits-remaining— remaining quota, so the agent can self-limit before a402.x-cache—HITorMISS; cached responses do not count against quota.x-screenshot-id— stable id for the render, useful in logs and issue comments.x-duration-ms— server render time.
Error bodies are JSON. A 400 or 402 returns { error, message }; a 500 adds a fix field with a remediation hint the agent can act on. Check the status before reading bytes:
javascriptconst response = await fetch('https://screenshotapi.to/api/v1/screenshot', { method: 'POST', headers: { 'content-type': 'application/json', 'x-api-key': process.env.SCREENSHOTAPI_KEY }, body: JSON.stringify({ html, type: 'png', width: 1200, height: 630 }) }); if (!response.ok) { const { error, message, fix } = await response.json(); // `fix` is present on 500s and tells the agent how to recover throw new Error(`${error}: ${message}${fix ? ` (fix: ${fix})` : ''}`); } const remaining = Number(response.headers.get('x-credits-remaining')); if (remaining < 10) { // back off or notify before exhausting quota } await Bun.write('og-card.png', await response.arrayBuffer());
Deterministic rendering and safety
The agent controls the markup, so make it boring and repeatable:
- Use inline CSS. External stylesheets add network dependencies that can fail or change between renders.
- Set explicit
widthandheight(1200x630 for social cards) so layout never reflows. - Enable
preloadFontsonly when the design depends on a specific web font; skip it otherwise for faster, more predictable output. - Never include secrets, access tokens, passwords, PII, or untrusted third-party scripts in the HTML — the image and its cached copy may be shared.
- Keep HTML generation separate from capture. Log the generated HTML, then send it; that makes a bad render easy to reproduce and debug.
- Bound the content length so a verbose model cannot overflow the fixed viewport.
URL capture vs HTML rendering
Both modes hit the same endpoint; the difference is the input and the HTTP method.
| URL capture | HTML rendering | |
|---|---|---|
| Input | url (a live page) | html (agent-generated markup) |
| Method | GET or POST | POST only |
| Best for | Existing pages, deployed UIs, dashboards | Cards, reports, invoices, status boards |
| Content source | Lives at an address | Produced by the agent |
| Auth, formats, headers | Identical | Identical |
If the visual already exists at a URL, capture it — see the URL to image API for AI agents guide.
Use it from your stack
ScreenshotAPI plugs into the same places agents already live:
- MCP — connect the ScreenshotAPI MCP server so MCP-aware clients (Claude, Cursor, VS Code) can render HTML as a built-in tool.
- OpenAPI — import
/openapi.jsoninto GPT Actions or any OpenAPI tool builder to generate therender_html_imageaction automatically. - Frameworks — wrap the
POSTcall as a tool in LangChain, the Vercel AI SDK, or the OpenAI Agents SDK using the schema above.
Full setup notes for each client live in the AI-agent setup guide.
Related guides
- URL to PDF API for AI Agents — render HTML or pages to PDF artifacts.
- Capture Visual QA Evidence with AI Agents — screenshots as test and review evidence.
- How to Convert HTML to Image — browser, server, and API approaches compared.
- AI Agent Integrations — every supported client and framework.
- Screenshot API Reference — all parameters, responses, and error states.
Frequently asked questions
Can an AI agent render raw HTML to an image?
Yes. Send POST /api/v1/screenshot with a JSON body containing an html field and an image type such as png, jpeg, or webp. HTML rendering requires POST; a GET request with html is rejected with the message 'Use POST /api/v1/screenshot for html rendering.' The response is the binary image body.
What is the best way to make agent HTML render deterministically?
Constrain the agent to self-contained HTML with inline CSS, no external scripts, and bounded content. Set an explicit width and height (1200x630 is a good default for social cards) so layout never reflows unexpectedly. Only enable preloadFonts when the design depends on a specific web font, since extra network work makes output less predictable.
When should an agent render HTML instead of capturing a live URL?
Render HTML when the agent already produced the content and there is no page to visit, such as a summary card, an invoice, or a status board built from data. Capture a live URL when the visual already exists at an address, for example a deployed page or a dashboard. See the companion guide on the URL to image API for AI agents for that path.
What should agents avoid putting in HTML screenshots?
Never include API keys, access tokens, passwords, personally identifiable information, or other secrets, because the rendered image and any cached copy can be shared as an artifact. Do not inject untrusted third-party scripts. Keep generation separate from capture so the HTML can be logged and reviewed safely.
Which image and document formats does the API return?
Set type to png (the default), jpeg, webp, or pdf. quality (1-100) applies only to jpeg and webp. The same POST endpoint that renders HTML to an image also renders HTML to a PDF when type=pdf, which is useful for invoices and receipts.
How does an agent know if a render succeeded and how much quota is left?
Check response.ok and read the response headers. x-credits-remaining lets an agent self-limit before hitting the quota, x-cache reports HIT or MISS, and x-screenshot-id identifies the render for logs. On a 500 the JSON error body includes a fix field with a remediation hint the agent can act on.
Related resources
AI Agent Integrations
Use ScreenshotAPI from ChatGPT, Claude, Cursor, Codex, VS Code, and agent frameworks.
URL to Image API for AI Agents
Capture live public or authorized URLs as images from an agent.
ScreenshotAPI MCP Server
Expose screenshot and HTML rendering to agents over MCP.
Screenshot API Reference
Request parameters, authentication, responses, and error states.
Start capturing screenshots today
Create a free account and get 200 free screenshots per month to try the API. No credit card required.