AI on Google Search is releasing Search Live in the Google app. It uses a real-time camera and voice input. The feature runs on the Gemini 3.1 Flash Live model. Users can now show their surroundings to the AI in Google Search and ask questions in a true dialogue.  

Important Details 

  • Real-time communication: users benefit from seamless voice conversations with AI, enabling faster, more natural searches.  
  • Camera and voice integration: with instant camera and voice activation, users can quickly get answers about any object or place they encounter.  
  • Location: the feature is in the Google app (Android and iOS), accessible via the live icon under the search bar.  
  • Availability: expanding to 200+ countries and several Indian languages, Search Live benefits a broad global audience.  

How Search Alive Works 

  • Open live mode in the Google app, tap the Live button, or access it via Google Lens (a tool for searching by images captured from a camera).  
  • Point and ask, enable the camera, and ask questions allowed.  
  • The AI gives audio feedback. It also shows relevant web links.  
  • Continuous conversation—the feature permits follow-up questions for natural interaction.  
  • Background operation: users can keep interacting with the AI while multitasking, maintaining efficiency even though camera sharing pauses.  

Use Cases 

  • Troubleshooting: users can point the camera at electronics to ask how to connect specific cables.  
  • Traveling users can identify landmarks.  
  • Hobbies and learning: users can request explanations for items in a matcha set or about educational experiments.  
  • Shopping, getting shook, product details, and reviews.  

This is part of a shift toward multimodal search where imagery, visual cues, and speech replace text input.  

Google has launched Gemini 3.1 Flash Live, a real-time audio and voice AI model for faster, more natural conversations. It reduces latency, improves reliability, and enhances dialogue quality for advanced, voice-first, multimodal AI applications.  

Gemini 3.1 Flash Live  

Gemini 3.1 Flash Live manages real-time conversations with enhanced responsiveness and context awareness. It supports natural dialogue flow, multi-term interactions, extended conversations, and dynamic user inputs.  

The model delivers reliable, natural-sounding conversations and completes complex tasks, achieving benchmarks that exhibit significant improvements over previous versions. For example:  

  • ComplexFunkBench audio: Gemine 3.1 Flash Live achieves 90.8% on multi-step function calling with various component constraints, outperforming earlier models.  
  • Scale AI audio multi-challenge: it scores 36.1% with thinking enabled, excelling at complex instruction following and long-horizon reasoning, despite interruptions and hesitations typical of real-world audio.  

Key Features And Improvements 

  • The model delivers faster responses and maintains fluid, instant interactions, even in noisy environments, by filtering out background noise for reliable performance.  
  • Better reliability in real-life conditions: Gemini 3.1 Flash Life executes tasks more reliably in noisy environments by filtering out background noise such as traffic or television, ensuring agents remain responsive to instructions.  
  • It closely follows complex instructions and guardrails, ensuring dependable performance even as conversations shift.  
  • The model accurately interprets pitch, tone, and place, adapting responses to user sentiment and enabling more natural dialogue.  
  • More natural dialogue flow: The model maintains conversation threads for longer periods, preserving context throughout extended interactions and idea generation. Mission Sessions  
  • It enables real-time conversations in over 90 languages for global accessibility and consistent performance.  

Developers can use the Gemini Live API (a platform for building features using real-time data) to build real-time conversational agents that process voice and video inputs and respond instantly. Key capabilities include:  

  • Handling real-time audio and multimodal input  
  • Function calling and external tool integration  
  • Session management for long-running conversations  
  • Ephemeral tokens for secure interactions  
  • Building interactive voice-first AI agents  

In addition to these foundational capabilities, the Google Gen AI SDK (a software toolkit for building generative AI features) enables asynchronous connections to audio sessions and supports instant interaction. Actions  

Search Live Expansion And Use Cases 

Search Live now works in 200+ regions with AI mode, using Gemini 3.1 Flash Live for real-time voice and camera queries. AI mode is available in Bengali, Gujarati, Kannada, Malayalam, Marathi, Odia, Tamil, Telugu, Urdu, and more.  

Key Features Of Search Live Include: 

  • Voice-activated conversation through the Google app  
  • Follow-up questions in ongoing sessions  
  • Camera input for context-aware queries  
  • Google Lens integration for visual L-word interaction  
  • Helpful audio responses with supporting web links  

This allows users to perform tasks that require real-time interaction, such as troubleshooting, learning, or investigating real-world objects.  

Ecosystem And Integrations 

Gemini 3.1 Flash Live delivers scalable infrastructure and partner integration for production environments:  

  • WebRTC-based systems for live voice and video  
  • Global edge routing for distributed applications  
  • Partner integrations for handling diverse input systems  

Companies such as Verizon, LiveKit, and the Home Depot report positive results using the model in conversational workflows.  

Safety And Content Authenticity 

All generated audio includes a synth ID watermark imperceptibly embedded in the output. This enables the detection of AI-produced content, supporting honesty and reducing misinformation.  

Availability 

Gemini 3.1 Flash Live is available across multiple Google platforms.  

  • Developers: preview access via Gemini Live API in Google AI Studio  
  • Enterprises: Gemini Enterprise for Customer Experience Applications  
  • End users: Gemini Live and Search Alive  
  • Global Reach: Search Live is available in 200+ countries and territories with AI mode.  
  • Languages: real-time conversation support in more than 90 languages  
  • The Platforms column is accessible via the Google app on Android and iOS, as well as through Google Lens for camera-based interactions.  

SourceGoogle rolls out Gemini 3.1 Flash Live for real-time voice AI conversations, expands Search Live globally 

These tiers show Samsung’s goal to improve code, language, and image workloads across settings. Early adoption has led to noticeable productivity gains. Developer use of its assistant grew by 4x after switching to Gauss 2. Many technical details remain undisclosed. Analysts await independent proof.  

This article unpacks Gauss 2’s specifications, strategy benefits, and unanswered questions for enterprise buyers. To set the context, it first situates Samsung’s Live within the wider enterprise Gen.AI model landscape shaping 2025. With this perspective, readers gain concrete data points and applicable considerations for future AI roadmaps. Professionals may also explore certification paths to guide successful project deployment. Let us explore the core developments powering Samsung’s latest AI statement.  

Samsung Gauss 2 Model Overview. 

Building on the introduction, Gauss 2 is Samsung’s second internal formation model following Gauss. This project highlights Samsung researchers’ growth in AI. The enterprise GenAI model comes in three versions: Compact, Balanced, and Supreme, each for different tasks. Compact runs directly on devices for offline help with Galaxy phones and appliances. Balanced operates in Samsung data centers to enable broader consumer services, balancing speed and scale.  

Supreme uses a mixture of experts for complex inference and training. Samsung includes a custom tokenizer that supports 9 to 14 languages, depending on the setup, enabling faster multilingual processing than top open-source options. All versions support multimodal input — text, code, and images making Gauss 2 a flexible corporate content platform. In short, Samsung offers a range of options within a single enterprise Gen AI model family, informing enterprise adoption strategies.  

Strategic Enterprise Gen AI Move 

Samsung’s shift aligns with the world’s goal to use AI across 90% of its business areas. Leaders see Gauss 2 as the main engine for this change. By building its own platform, Samsung can control data location, privacy, and how the model works. It also saves on ongoing API costs to outside providers. Experts note that Samsung’s chip expertise helps it improve both the model and the hardware. It runs on. Competitors rely on third-party hardware and unclear messages. Gauss 2 also gives Samsung more power when working with telecom and cloud partners. These benefits support the company’s investments. Still, keeping funding and top talent is key to achieving long-term success. This context leads to a closer look at multimodal features.  

Multimodal Capabilities in Depth 

Multimodality refers to the ability to use multiple input types (text, code, images, and language translation) within a single system. For example, users can upload screenshots or design drafts and receive code suggestions tailored to the context. Developers can have the model update old scripts while viewing visual layouts. Call center agents get quick language summaries from recorded calls. Samsung says response crafting is now three times faster with those tools. The supreme version also improves knowledge in graphs, meaning it connects answers to real product facts. This reduces errors and improves productivity for support teams. Most open models require separate tools for each input type, but Gauss-2 combines them. These features set the stage for performance analysis.  

Performance And Adoption Data 

HUD numbers remain limited, yet Samsung shared several adoption metrics. According to the firm, usage of the coding assistant increased within months of Gauss 2 integration. Moreover, about 60% of Device-experience developers access the assistant weekly. The enterprise Gen AI model backs these gains by delivering 1.5 to 3 times faster processing. Samsung compared Balanced and Supreme against unnamed open-source baselines on internal benchmarks. However, the company has not released full datasets, tasks, or details on statistical significance as independent topics. Therefore, treat the figures as marketing claims awaiting third-party validation.  

Analysis of these performance data would not be complete without considering transparency and validation. This natural progression leads to broader consideration of benefits and challenges for stakeholders evaluating the platform.  

Benefits For The Samsung Ecosystem. 

The Gauss 2 rollout benefits more than just developers. On-device processing means tasks run directly on devices, reducing cloud latency and improving privacy. Galaxy phones with the compact version can transcribe or capture images offline, offering faster language translation and keeping data on the device. The balance-term and supreme versions help service teams by summarizing information and routing tickets efficiently, reducing support costs. Samsung fine-tunes the enterprise Gen AI model for business needs using its own data (instead of third-party data), which is harder to do on generic platforms. Organizations considering Gauss 2 should keep these key benefits in mind:  

  • Cost control through reduced external API calls.  
  • Unified handling of software, language, and image data.  
  • On-device experiences boosted buyer interest.  
  • Scalable architecture matching workload size.  

Together, these benefits make a strong case for Samsung’s AI platform. However, to provide a balanced view, before adopting Gauss 2, organizations should consider potential challenges and questions.  

Challenges And Open Questions. 

Like any proprietary platform, Gauss 2 comes with some risks. Samsung has not shared specifics such as parameter counts (number of model settings) or training sources (datasets used for learning), making it hard for analysts to compare it to models like GPT-4 or Gemini. There is also limited information on safety testing (risk evaluation), bias controls (methods to reduce bias in outputs), and governance (policies overseeing AI use). The Enterprise Gen AI model does not yet have a public API, meaning external developers cannot easily access its features, and there is no pricing information for planning integrations. By contrast, open-source models on Hugging Face are easier to try out right away. Ongoing maintenance, especially for on-device updates, is another concern. Though Samsung’s hardware expertise may help reduce some costs, professionals can improve oversight by earning the AI Project Manager certification. These problems show there are still important unknowns, so reviewing the roadmap is essential.  

Roadmap and Industry Impact 

Samsung plans to add GALF to most of its products over the coming years. The supreme version targets cloud systems while the compact one powers wearables and home devices. Adding knowledge graphs will make information more precise and customized. Experts, Apple, Google, and Xiaomi are expected to respond with updates. Samsung’s move may also drive demand for better mobile AI chips and push job providers to reveal more about costs and performance. Companies will need to balance vendor independence with ecosystem benefits. The choice of a foundation model will depend on openness, transparency, and cost-effectiveness. Those tools’ roadmap could reset buyer expectations for AI. These points lead us to our final thoughts.  

Gauss 2 shows that Samsung wants to shape its own AI features. The platform brings together software, language, and image processing into a single system. Early results point to real productivity gains and faster service. However, the lack of technical transparency means buyers need to do careful research. Companies should ask for clear benchmarks, safety information, and governance policies. As for the competition, Samsung will likely disclose more details soon. Professionals can help guide these decisions by earning the AI Project Manager certification. Now is the time to align your strategy with the fast-changing world of enterprise Gen.AI.

Source: Samsung Gauss2 Enterprise GenAI Model for Multimodal Workflows 

GPT 5.4 is our most advanced model so far. It enables faster, more accurate results in the API and Codex, helping people and teams make better decisions, increase productivity, and streamline processes.  

In most cases, GPT-5.4 is the default choice for general tasks and coding, chosen to simplify complex workflows, save time on software engineering, enhance reasoning, improve writing quality, and open tools, all with one model.  

This article presents the standard features of the GPT-5 models and shows practical ways to make the most of GPT-5.4.  

Key Improvements 

GPT 5.4 offers several improvements over the previous GPT 5.2 model:  

  • Experience sharper coding, better document understanding, smarter audio, and more reliable instruction following.  
  • Enhanced image perception lets users analyze visuals more accurately. It also helps manage multimodal workflows more easily.  
  • Users can complete long-running tasks faster than before. They can also execute multi-step agent workflows more reliably.  
  • More efficient token use reduces costs and improves end-to-end performance for heavy tool-based workloads.  
  • Faster, smarter web search uncovers hard-to-find information, saving time and simplifying research.  
  • Streamlining the handling of many documents or spreadsheets boosts productivity across customer service, analytics, and finance workflows.  

Developers produce production-ready code and polished interfaces faster and more consistently, with fewer prompts for refinement.  

For agent-based tasks, GPT 5.4 completes multi-step processes faster. It often uses fewer tokens and tool calls. This makes agent-based approaches more responsive and reduces the cost of operating complex workflows at scale in API and Codex.  

New Features in GPT 5.4 

Like its predecessors, GPT 5.4 offers flexible tool options, control over explanation detail, and curated tool lists. Now enjoy new features that make building agent systems easier, help manage more information, and ensure reliable automation.  

  • With the API tool search, you can seamlessly browse tools across vast ecosystems. Only what you need. Work smarter with fewer tokens and on-point choices. Discover more in the tool search guide.  
  • 1M token context window: GPT‑5.4 can handle up to 1M tokens. This makes it easier to analyze entire codebases and large sets of documents, or to run agent processes in a single request. You can read more in the “1M context window” section.  
  • Interact directly with software for the first time. Agents can now complete, check, and fix tasks faster in a complete build, run, and verify. Check out the computer use guide for more.  
  • Power through longer processes. Keep vital content thanks to GPT-5.4’s native compaction support.  

Meet the Models 

For most tasks and coding, GPT-5.4 is your new go-to model. It now replaces GPT-5.2. GPT-5.4 Codex and ChatGPT users get GPT-5 chat (latest) by default. Need better answers? GPT-5.4 Pro Raw hardness offers extra compute for data-fest challenges.  

Prefer a compact model try GPT-5 Mini for streamlined performance.  

Ready to choose and weigh these trade-offs to find your perfect match:  

Variant  Best for  
GPT 5.4  General purpose work including complex reasoning, broad word knowledge and code-heavy or Code heavy multistep agentic tasks  
Gpt 5.4 Pro  Tough problems that may take longer to solve and need deeper reasoning  
GPT 5 mini  Cost-optimized reasoning and chat; balancesspeed, cost, and capability.  
GPT 5 nano  High-throughput tasks, especially straightforward instruction-following or classification  

Lower Reasoning Effort 

The reasoning effort setting determines how many reasoning tokens the model uses before responding. Older models like O3 only offered low, medium, and high options. “No” meant faster, less thoughtful responses, while I meant longer, more reasoned answers.  

From GPT 5.2 on, the lowest setting is called NUM, which enables faster responses. This is now the default in GPT 5.2 and later; to increase model reasoning, raise the setting to medium and observe the changes.  

When reasoning effort is set to none, prompts become more important. For better reasoning, even at the default setting, ask the model to think or list its steps before answering.  

Verbosity 

Verbosity controls how many output tokens the model produces. Fewer tokens make responses quicker. Reasoning style remains mostly unchanged, but responses will be briefer, which can end or hurt depending on your needs. Use high verbosity for detailed explanations or major code changes. Use low for brief answers or simple code.  

  • High verbosity is useful for detailed document explanations or major code refactoring.  
  • Low verbosity is best for short answers or simple code, such as SQL queries. GPT-5 supports high, medium, and low settings. In GPT 5.4, you can still adjust verbosity, with medium as the default.  

With GPT 5.4, medium and high robustness produce longer, more organized code with explanations. Semicolon, more,e generates shorter code with little extra commentary.  

GPT 5.4 is designed to solve problems by reasoning through them.  

Models like GPT-5.4 solve problems step by step. They create an internal chain of thought as their reasoning; for best results, send these reasoning steps back to the model. This prevents the same reasoning from being repeated and keeps the conversation aligned with the model’s training. In conversations with multiple turns, using previous_response_id will automatically include earlier reasoning steps. This is especially useful when using tools. For example, if a function call needs another wrong group, you can use the previous_response_id, or alternatively, add the reasoning steps directly to the input.

SourceUsing GPT-5.4 

NVIDIA has introduced several new technologies to accelerate the development of humanoid robots. This includes NVIDIA ISAAC GR00T-N1, described as the world’s first open and fully customizable foundation model. It is a large artificial intelligence system trained on diverse data that can be adapted for many tasks, in this case, general humanoid reasoning and skills.  

Other technologies in the lineup include simulation frameworks and blueprints, such as the NVIDIA ISAAC-GR00T blueprint. A simulation framework is a set of software tools for testing and training robots. In a virtual environment, the blueprint helps generate synthetic training data. There is also Newton, an open-source physics engine developed with Google Brain and Disney Research, designed specifically to simulate real-world physical interactions for building robots.  

Building on these releases, GR00T-N1 is now available. It is the first in a series of customizable models that NVIDIA will share globally to support industries facing workforce shortages.  

The Age of Generalist Robotics is Here, said Jensen Wong, founder and CEO of NVIDIA, with NVIDIA ISAAC GR00T N1 and new data-generation and robot-learning frameworks. Robotics developers everywhere will open the next frontier in the age of AI.  

GR00T-N1 Advances Humanoid Developer Community 

The GR00T N1 Foundation Model uses a dual system design inspired by how people think. It features System-1, which acts quickly and automatically, like human reflexes or intuition, and System-2, which takes a slower, more careful approach to decision making. Dual-system design refers to splitting cognitive processes into fast and slow systems, similar to theories in human psychology.  

System 2 is powered by a vision-language model, a type of AI that understands images and written or spoken commands, reasons about its environment, and the instructions it has received to plan actions. System 1 then translates these actions into precise, continuous robot movements. System 1 is trained with data from both human demonstrations and a large volume of synthetic data generated by the NVIDIA Omniverse platform. The vision-language model enables the robot to interpret both visual and linguistic inputs.  

GR00T-N1 can handle a variety of common tasks, including grasping and moving objects with one or both arms and passing items between arms. It can also perform more complex multi-step tasks that need a longer context and a mix of general scales. These abilities are useful for tasks such as material handling, packaging, and inspection.  

Developers and researchers can further train GR00T-N1 with real or synthetic data to fit their own humanoid robots or tasks.  

During his GTC keynote, Goan showed 1X’s humanoid robot performing household tidying tasks on its own using a policy trained with GR00T-N1. This autonomous ability comes from an AI training partnership between 1X and NVIDIA.  

The future of human arts is concerning adaptability and learning, said Brent Bonich, CEO of One-X Technologies. While we develop our own models and media, GR00T-N1 provides a significant boost to robot reasoning and skills with minimal post-training data. We fully deploy on Neo-Gamma, promoting our mission of creating robots that are more than tools, yet companions capable of assisting humans in valuable, immeasurable ways.  

Other top humanoid developers with early access to GR00T-N1 include Agility Robotics, Boston Dynamics, Mentee Robotics, and Neura Robotics.  

NVIDIA, Google DeepMind, and Disney Research focus on physics.  

NVIDIA is working with Google DeepMind and Disney Research to develop Newton, an open-source physics engine. In this partnership, NVIDIA leads the development, with Google DeepMind and Disney Research contributing expertise, to help robots learn to perform complex tasks more accurately.  

Newton, built on the N-Media Warp Framework, will be optimized for robot learning when compatible with simulation frameworks such as MuJoCo and Isaac Sim. It will also utilize Disney’s physics engine.  

Google DeepMind and NVIDIA are also co-developing MuJoCo-Warp, aiming to accelerate robotics machine learning tasks by over 70 times. Developers will access it via Google DeepMind’s MJX open-source library and the Newton engine, co-developed with NVIDIA.  

Disney Research, as a partner in the Newton project, will be among the first to use the engine to improve its robotic character platform. This platform powers next-generation entertainment robots like the expressive Star Wars-inspired BDH droids that appeared with Huang during his GTC keynote.  

The BDH droids are just the beginning. We’re committed to bringing more characters alive in ways the world hasn’t seen before. This cooperation with Disney Research and Video and Google is a key part of that vision, said Kyle Laughlin, Sr. Vice President of Walt Disney Imagineering Research and Development. This alliance will allow us to create a new generation of robotic characters that are more expressive and engaging than ever before and connect with our guests in ways that Disney can.  

Continuing their collaboration, NVIDIA, Disney Research, and Intrinsic have announced a new partnership. Each organization will collaborate to develop OpenUSD pipelines and best practices for robotics data workflows, with NVIDIA overseeing the technical architecture and Disney and Intrinsic contributing their expertise in robotics and data management.  

NVIDIA has also announced the BGX Spark Personal AI supercomputer at GTC. It gives developers a ready-to-use system to expand GR00T and N1’s capabilities for new robots’ tasks and environments without requiring much custom programming.  

The Newton physics engine will be released later this year.

Source: NVIDIA Announces Isaac GR00T N1 — the World’s First Open Humanoid Robot Foundation Model