Today, OpenAI is launching a research preview of GPT-5.3 Codex Spark, a smaller version of GPT-5.3 Codex and its first model built for live coding. Codex Spark is the first result of our partnership with Cerebras, announced in January. It’s designed to feel almost instant on ultra-low-latency hardware, delivering over 1,000 tokens per second while staying highly effective for real-life programming tasks.
We are sharing Codex Spark on Cerebras as a research preview for ChatGPT Pro users, so developers can start experimenting early. At the same time, we work with Cerebras to ramp up data center capacity, harden the end-to-end user experience, and deploy our larger frontier models.
Our latest models are especially good at handling long-running tasks, working on their own for hours, days, or even weeks. Codex Spark is our first model built for instant use with Codex, so you can make targeted edits, adjust logic, or refine interfaces and see results right away. Now Codex supports both big, ongoing projects and quick, in-the-moment work.
We look forward to learning from developers and using your feedback as we expand access.
At launch, Codex Spark has a 128K context window and is text only. During the research preview, Codex Spark will have its own rate limits, and usage will not count towards standard rate limits. However, when demand is high, you may see limited access or temporary queuing as we balance service reliability across users.
Speed and Intelligence
Codex Spark is built for interactive work where speed is just as important as intelligence. You can work with the model in real time, interrupt or redirect it as needed, and quickly try out new ideas with fast responses. Since it’s tuned for speed, Codex Spark keeps things simple by making only minimal targeted edits and running tests only when you ask.
Coding
Codex Spark is a powerful small model designed for fast results on SWE Bench Pro and Terminal Bench 2.0, which tests software engineering skills. GPT-5.3 Codex Spark performs well and completes tasks much faster than GPT-5.3.
Latency Improvements For All Models
As we trained Codex Spark, it became apparent that model speed was just part of the equation for instant collaboration. We also needed to decrease latency across the full request-response pipeline.
We implemented end-to-end latency improvements in our harness, benefiting all models. We streamlined how responses stream from client to server and back, remote key pieces of our inference stack, and reworked how sessions are initialized so that the first viable token appears sooner, and Codex stays responsive as you iterate. By introducing a persistent WebSocket connection and targeted optimizations in the responses API, we reduced:
- per-client/server round-trip overhead by 80%
- per-token overhead by 30%
- time to first token by 50%
The WebSocket path is enabled for Codex Spark by default and will become the default for all models soon.
Powered by Cerebras
Codex Spark runs on the Cerebras Wafer Scale Engine 3, a specialized AI accelerator built for high-speed inference, providing Codex with a low-latency serving option. We worked with Cerebras to add this fast path to our main production system. Codex Spark works smoothly with the rest of our models and prepares us to support future ones.
GPUs remain the main component of our training and inference systems and offer the most cost-effective tokens for general use. Cerebras adds to this by handling tasks that require very low latency, making Codex feel more responsive as you work. You can also combine GPUs and Cerebras for the best performance on a single workload.
Availability and Details
Codex Spark is rolling out today as a research preview for ChatGPT Pro users in the latest versions of the Codex app, CLI, and VS Code extension. Since it uses special low-latency hardware, it has its own rate limit that may change based on demand during the preview. We are also offering Codex Spark in the API to a small group of design partners to learn how developers want to use it in their products. We will expand access further in the coming weeks as we continue improving our integration and learn more with the developer community about where fast models shine for coding. We will introduce even more capabilities, including larger models, longer context lines, and multi-modal input.
Codex Spark has the same safety training as our main models, including cybersecurity training. We reviewed Codex Spark as part of our usual deployment process, which includes access to cyber and other capabilities. We found that it does not have a realistic chance of meeting our preparedness framework threshold for high capability in cybersecurity or biology.
What’s Next?
Codex Spark is the initial step toward a codex that offers two modes:
- Long-term Reasoning and Execution
- Instant Collaboration for quick changes
Over time, these modes will blend. Codex will let you stay in a close interactive loop while sending longer tasks to sub-agents in the background or spreading tasks across many models at once when you need speed and coverage, so you won’t have to pick up just one mode at the start.
As models improve, interaction speed becomes a greater challenge. Ultra-fast inference helps close the gap, making Codex easier to use and opening new possibilities for anyone turning ideas into working software.
Source: Introducing GPT‑5.3‑Codex‑Spark










