OpenAI Computer Use Agent: GPT-5 driving a browser.

◢ The stack

OpenAI

GPT models, embeddings, function calling

Pay-as-you-go

Codex CLI

OpenAI's terminal coding agent — reads repos, writes code, opens PRs

ChatGPT Plus · $20/mo

◢ The build · 4 steps · 20 min

Follow these in order. Don't skip.

Step 01 / 04

Get GPT-5 Computer Use API access

▸platform.openai.com → API keys → Create new secret key
▸Add billing if you haven't (Computer Use is API only, not part of ChatGPT Plus)
▸The model is computer-use-preview — usage limits apply, request access if rate-limited

.env

1OPENAI_API_KEY=sk-...

Step 02 / 04

Spin up the OpenAI Computer Use sample

Terminal

1git clone https://github.com/openai/openai-cua-sample-app
2cd openai-cua-sample-app
3 
4# Python venv
5python3 -m venv .venv && source .venv/bin/activate
6pip install -r requirements.txt
7 
8# Set your key
9echo "OPENAI_API_KEY=sk-..." > .env
10 
11# Run with Playwright as the browser driver
12python cli.py --computer local-playwright

◆ Heads up

The CLI shows you a prompt. Type a task and a Chromium window opens. The agent screenshots, the model plans the next click, the browser executes — repeat until the task is done.

Step 03 / 04

Use Operator from your Python code

agent/operator.py

1from openai import OpenAI
2from playwright.sync_api import sync_playwright
3import base64
4 
5client = OpenAI()
6 
7def shot(page) -> str:
8    return base64.b64encode(page.screenshot()).decode()
9 
10def run(task: str):
11    with sync_playwright() as p:
12        browser = p.chromium.launch(headless=False)
13        page = browser.new_page(viewport={"width": 1280, "height": 800})
14        page.goto("about:blank")
15 
16        response = client.responses.create(
17            model="computer-use-preview",
18            tools=[{"type": "computer_use_preview",
19                    "display_width": 1280, "display_height": 800, "environment": "browser"}],
20            input=[{"role": "user", "content": [
21                {"type": "input_text", "text": task},
22                {"type": "input_image", "image_url": f"data:image/png;base64,{shot(page)}"},
23            ]}],
24            truncation="auto",
25        )
26 
27        # Loop: handle each computer_call → page.mouse.click / page.keyboard.type → fresh screenshot → continue
28        # Stop when response.output[-1].type == "message"
29        return response
30 
31if __name__ == "__main__":
32    run("Go to news.ycombinator.com and tell me the top 3 story titles.")

Step 04 / 04

Production hardening checklist

▸ALWAYS run in a sandbox — Docker, Playwright headless, or a remote browser farm. Never on your real desktop.
▸Block dangerous URLs at the network layer (banking, work email, anything you wouldn't trust a stranger with)
▸Set a per-task budget — cap tokens AND time. A runaway loop is real.
▸Log every screenshot + every action. You'll need them for debugging and customer trust.
▸Treat outputs as untrusted. The web is full of injection attempts that target Computer Use agents.

◆ Ship-it checklist

5 CHECKS

OpenAI API key with Computer Use access
openai-cua-sample-app cloned and running locally
You completed at least one custom task in the sandbox
You know the difference between this and Anthropic's Computer Use (browser-only sandbox vs full OS)
You have a logging plan for screenshots + actions before going to production

← All guides Show your build in the community →