OpenAI’s real-time API is now generally available following its official announcement and release in August 2025. This update includes support for remote model context protocol (MCP) servers and session initiation protocol (SIP).
Key Features of the Real-Time API Now Available
- General availability: The Runtime API is now production-ready and open to all paid developers.
- Remote MCP Server Support Developers can connect AI voice agents to external tools and capabilities on any MCP-compliant server. The API automatically manages tool coils, making it easier to expand an agent’s features without manual integration.
- SIP protocol integration: With native Session Initiation Protocol (SIP) support a common standard for initiating and managing voice communication over IP networks enterprises can connect AI voice agents directly to traditional PBX (Private Branch Exchange) systems and phone networks. This supports automated call handling, appointment scheduling, and customer service in contact centers.
- New GPT Real-time model: The API uses the advanced GPT Real-time model, offering lower latency, more natural-sounding speech, and better performance with complex instructions.
- Multi-modal inputs. The real-time API supports audio, image, and text inputs as well as audio and text outputs. This allows for a wide range of applications.
For comprehensive setup and usage instructions and to explore how these new capabilities can accelerate your project, visit the OpenAI documentation today.
OpenAI has introduced support for the remote model context protocol (MCP) server, which lets models access context from external sources, and for the Session Initiation Protocol (SIP), a widely used standard for starting and managing online voice and video calls. These technologies are integrated into its GPT-real-time speech-to-text model. These updates are available through a dedicated API. They are designed to help businesses create more autonomous voice-based agents.
Support for remote MCP (Media Control Protocol) servers in the Real-Time API is now generally available. MCP enables communication with external applications. This lets developers program voice-based agents to access external capabilities or tools. These tools are listed as MCP servers on the internet or other servers, according to Charlie Dai, VP and Principal Analyst at Forrester.
Remote MCP servers are not listed locally where the agent or application runs.
OpenAI said enterprises can enable MCP support in an API session by entering the URL of a remote MCP server in the session configuration.
Once you connect, the API automatically handles the tool calls, so you don’t need to manually wire up integrations. This setup makes it easy to extend your agent with new capabilities, the company explained in a blog post.
Dai highlighted SIP as a standard for starting and managing real-time voice calls over IP networks, enabling AI voice agents to connect with PBX systems and phone networks.
Examples of use cases where enterprises can take advantage of SAP support in the API comprise:
- automated call handling
- appointment scheduling
- multilingual support for customer services in contact centers
Dai added.
Image Input And Additional Capabilities
To make the GPT real-time model more useful for voice-based tasks, OpenAI now lets users include images, like photos, screenshots, or other visuals, along with text or audio in a session.
This functionality enables the model to analyze and respond to image content. Users can ask questions such as “What do you see?” or “Can you read the text within this image?”, according to OpenAI’s blog post.
Analysts say the ability to upload images is an important addition that will be useful to businesses.
This can be seen as multi-modal support, meaning the ability to process and understand multiple forms of input, such as text, images, and audio, which is a key area in the market, Dai said. He added that competitors like Google, with Project Astra, are also focusing on multimodal live assistance. Besides image input, OpenAI has improved GPT’s real-time context awareness and memory.
OpenAI also said the updated GPT real-time model is better at following complex instructions, calling tools accurately, and producing speech that sounds more natural and expressive.
Dai said these improvements will help businesses use the API for fast, natural voice interactions in many areas. These improve real-time medical transcription, enhance booking assistance, improve customer service for banking, insurance, and telecom, and enhance employee support. Across industries, Penn AI said businesses using the API can now choose from two new voices: Cedar and Marin.
Microsoft OpenAI’s largest investor also announced two text-to-speech models this week. The company said these will help unlock a wide range of enterprise uses.
Source: OpenAI adds MCP and SIP support to gpt-realtime for smarter voice-based agents










