Today, OpenAI is launching a research preview of GPT-5.3 Codex Spark, a smaller version of GPT-5.3 Codex, and its first model built for live coding. Codex Spark is the first result of its partnership with Cerebras, announced in January. It’s designed to feel almost instant on ultra-low latency hardware, delivering over 1,000 tokens per second while remaining highly effective for actual programming tasks.  

We are making Codex Spark on Cerebras available as a research preview to ChatGPT Pro users. This lets developers start experimenting early while we work with Cerebras to increase data center capacity, improve the user experience, and prepare for the launch of our larger models.  

Our latest models are especially good at handling long-running tasks, working on their own for hours, days, or even weeks. Codex Spark is our first model built for instant work with Codex, so you can make targeted edits, adjust logic, or refine interfaces and see results right away. Now Codex supports both big, ongoing projects and quick, in-the-moment work.  

We look forward to learning from developers and using your feedback as we expand access.  

At launch, Codex Spark has a 128K context window and supports only text. During the research preview, it will have its own rate limits, and usage won’t count toward standard limits. If demand is high, you might see limited access or short waits as we keep things reliable for everyone.  

Speed and Intelligence 

Codex Spark is built for interactive work where speed is just as important as intelligence. You can work with the model in real time, interrupt or redirect it as needed, and quickly try out new ideas with fast responses. Since it is tuned for speed, Codex Spark keeps things simple by making only minimal targeted changes and running tests only when you ask.  

Coding 

Codex Spark is a powerful, small model designed for fast results on the HWE Bench Pro and Terminal Bench 2.0, which tests software engineering skills. GPT-5.3 Codex Spark performs well and completes tasks much faster than GPT-5.3 Codex.  

Latency Improvements for All Models 

While training Codex Spark, we realized that speed alone wasn’t enough for instant collaboration. We also needed to reduce latency throughout the entire request and response process.  
 
We made improvements that will help all models, such as:  

  • streamlining how responses move between client and server  
  • updating our inference stack  
  • making sessions start faster so you see the first token sooner  

By adding a persistent WebSocket connection and upgrading the responses to API, we reduced the client/server round-trip overhead by 80% per token and the first-time end-to-end time for the first token by 50%. Codex Spark uses the WebSocket path by default, and soon all models will too.  

Powered by Cerebras 

Codex Spark runs on the Cerebras wafer-scale engine 3, a specialized AI accelerator built for high-speed inference, providing Codex with a low-latency serving option. We worked with Cerebras to add this fast path into our main production system, so Codex works smoothly and is ready to support future models.  

What excites us most about ChatGPT 5.3 Codex Spark is partnering with OpenAI and the developer community to discover what fast inference makes possible: new interaction methods, new use cases, and a fundamentally different model experience. This preview is just the beginning: Sean Lee, CTO and co-founder of Cerebras.  

GPUs are still the backbone of our training and inference systems, providing the most cost-effective solution for broad adoption. Cerebras adds to this by handling tasks that require very low latency, making Codex feel more responsive as you work. You can also combine GPUs and Cerebras for the best performance on single workloads.  

Availability & Details 

Codex Spark is launching today as a research preview for ChatGPT Pro users in the latest versions of the Codex app, CLI, and VS Code extension. Since it runs on special low-latency hardware, it has its own rate limit that may change based on demand. During the preview, we are also making Codex Spark available in the API for a small group of design partners to see how developers want to use it in their products. We’ll expand access in the coming weeks as we continue improving our integration.  

Right now, Codex Spark is text-only, with a 128K context window, and is the first in a new line of ultra-fast models. As we learn from the developer community about where fast models work best for coding, we’ll add more features, such as:  

  • larger models  
  • longer context windows  
  • support for different types of input  

Codex Spark has the same safety training as our main models, including cybersecurity training. We reviewed Codex Spark as part of our usual deployment process, which checks for cyber and other abilities. We found that it does not have a realistic chance of reaching our preparedness framework threshold for higher capability in cybersecurity or biology.  

What’s Next 

Codex Spark is just the beginning. The goal is to create a Codex in two main modes:  

  1. One for longer-term reasoning and execution  
  1. Another for live collaboration and quick changes  

Over time, these modes will come together. Codex will let you stay closely involved while it handles longer tasks in the background or spreads work across many models at once when you need speed and variety. This way, you won’t have to pick just one mode from the start.  

As models get better, the speed of interaction becomes more important. Faster responses make Codex easier to use and open new possibilities for everyone who wants to turn an idea into working software.

Source: Introducing GPT‑5.3‑Codex‑Spark 

Amazon

Leave a Reply

Your email address will not be published. Required fields are marked *