All guides
MASTER CLASS · COMPUTER USE · INTERMEDIATE

OpenAI Computer Use Agent: GPT-5 driving a browser.

OpenAI's Operator + the computer-use-preview model in the API. Same idea as Anthropic's Computer Use but inside the OpenAI ecosystem with built-in browser sandboxing.

▸ When you're done

A Codex/GPT-5-powered agent that controls a Chromium browser, fills forms, clicks through flows, and posts results back. All in a Docker sandbox you can ship to a server.

20 min walkthrough
2 tools · all free tier
Copy-paste ready · no theory
The stack
◢ The build · 4 steps · 20 min

Follow these in order. Don't skip.

Step 01 / 04

Get GPT-5 Computer Use API access

  • platform.openai.com → API keys → Create new secret key
  • Add billing if you haven't (Computer Use is API only, not part of ChatGPT Plus)
  • The model is computer-use-preview — usage limits apply, request access if rate-limited
.env
1OPENAI_API_KEY=sk-...
Step 02 / 04

Spin up the OpenAI Computer Use sample

Terminal
1git clone https://github.com/openai/openai-cua-sample-app
2cd openai-cua-sample-app
3
4# Python venv
5python3 -m venv .venv && source .venv/bin/activate
6pip install -r requirements.txt
7
8# Set your key
9echo "OPENAI_API_KEY=sk-..." > .env
10
11# Run with Playwright as the browser driver
12python cli.py --computer local-playwright
Heads up
The CLI shows you a prompt. Type a task and a Chromium window opens. The agent screenshots, the model plans the next click, the browser executes — repeat until the task is done.
Step 03 / 04

Use Operator from your Python code

agent/operator.py
1from openai import OpenAI
2from playwright.sync_api import sync_playwright
3import base64
4
5client = OpenAI()
6
7def shot(page) -> str:
8 return base64.b64encode(page.screenshot()).decode()
9
10def run(task: str):
11 with sync_playwright() as p:
12 browser = p.chromium.launch(headless=False)
13 page = browser.new_page(viewport={"width": 1280, "height": 800})
14 page.goto("about:blank")
15
16 response = client.responses.create(
17 model="computer-use-preview",
18 tools=[{"type": "computer_use_preview",
19 "display_width": 1280, "display_height": 800, "environment": "browser"}],
20 input=[{"role": "user", "content": [
21 {"type": "input_text", "text": task},
22 {"type": "input_image", "image_url": f"data:image/png;base64,{shot(page)}"},
23 ]}],
24 truncation="auto",
25 )
26
27 # Loop: handle each computer_call → page.mouse.click / page.keyboard.type → fresh screenshot → continue
28 # Stop when response.output[-1].type == "message"
29 return response
30
31if __name__ == "__main__":
32 run("Go to news.ycombinator.com and tell me the top 3 story titles.")
Step 04 / 04

Production hardening checklist

  • ALWAYS run in a sandbox — Docker, Playwright headless, or a remote browser farm. Never on your real desktop.
  • Block dangerous URLs at the network layer (banking, work email, anything you wouldn't trust a stranger with)
  • Set a per-task budget — cap tokens AND time. A runaway loop is real.
  • Log every screenshot + every action. You'll need them for debugging and customer trust.
  • Treat outputs as untrusted. The web is full of injection attempts that target Computer Use agents.
Ship-it checklist
5 CHECKS
  • OpenAI API key with Computer Use access
  • openai-cua-sample-app cloned and running locally
  • You completed at least one custom task in the sandbox
  • You know the difference between this and Anthropic's Computer Use (browser-only sandbox vs full OS)
  • You have a logging plan for screenshots + actions before going to production