At Universe 2025, GitHub unveiled Agent HQ, a platform that serves as machine control for AI coding assistance from vendors like OpenAI, Anthropic, Google, and xAI within the GitHub ecosystem.  

This initiative, often referred to as an AI Army command center, aims to transition developers from juggling separate tools to coordinating a team of agents that seamlessly write, test, and debug code together.  

Main details of GitHub Agent HQ (2025-2026) 

  • Centralized Control Plane: Agent HQ provides a single interface in GitHub, VS Code, and the command line for assigning, tracking, and managing AI tasks in real time.  
  • Third-party integration: column developers can now use models from Anthropic Cloud 3.7 Sonnet, Google Labs Jules, and xAI in their GitHub workflow, not just GitHub by Copilot.  
  • Agents can now perform tasks in sequence independently, starting with picking up issues.  
  • creating branches  
  • committing code  
  • opening pull requests  
  • Human developers then review the agents’ work and provide feedback as needed.  
  • Enterprise Governance: The platform provides advanced code review by agents, a control panel to manage agent actions, and a dashboard to track AI performance.  
  • Following the October 2025 announcement, third-party agents gradually became available to GitHub Copilot subscribers. Over the next few months, advanced features will be offered through Copilot Pro + or to enterprise clients.  

Shift To Agentic Development 

GitHub COO Kyle Daigle stated that the objective is to bring order to the condition caused by rapid AI growth. Agent HQ enables developers to move beyond basic chat-based assistants by leveraging agents for more structured, step-by-step programming assignments.  

Related security concerns (EchoLeak)  

In early 2025, researchers identified the first zero-click AI vulnerability in Microsoft’s broader Copilot ecosystem, though not an Agent HQ-specific one. It highlighted the risks of AI agents accessing sensitive data. Microsoft responded with stronger security and auditing in its Frontier Suite (Microsoft 365 E7), launched in early 2026.  

If you maintain open source projects or work on an enterprise team, seeing automated documentation fixes, new unit tests, or refactoring suggestions can be a real eye-opener. Still, automation raises a key question: how do you set limits on agents that can access your repository and the internet? You could worry about an agent using information from unreliable websites, or accidentally exposing an API token, or maybe it could start posting unnecessary comments on every open issue. For its automation to be truly valuable, it needs to be predictable.  

What is the safest way to add agents to existing automations like CI/CD? Agents are unpredictable and handle untrusted inputs. Examine your repository’s state and make decisions as they run. Along with agents in CI/CD with constant oversight, you can scale your engineering, but it requires safeguards to address security risks.  

GitHub agentic workflows are built on GitHub Actions. Normally, everything in an action shares the same level of trust. This means an unauthorized agent could interfere with MCP servers, access authentication secrets, or send network requests to any destination. If an agent has bugs, is manipulated by prompts, and has no restrictions, it could behave in unforeseen and unsafe ways.  

This is why security is a core part of Agentic Hub workflows. We see agent execution as an extension of the CI/CD model, not as something separate. We keep the creative part of building workflows apart from the control part of running them. Then, we turn workflow into a GitHub action with clear limits on permissions, inputs, audit records, and network access.  

In this post, we will explain how we designed Agentic workflows to be secure from the start, starting with the threat model and needed security architecture.  

Threat Model 

Two key features of agentic workflows affect the threat model for automation.  

Agents can understand repository state and act independently. While useful, they should not be trusted by default, especially with untrusted inputs.  

Second, GitHub Actions offer a very open execution environment. Sharing a trust domain helps with automation, broad access, and good performance; however, if untrusted agents are involved, a single trust domain can lead to extensive problems if something fails.  

With this model, we assume agents may access or modify unauthorized data, use or misuse channels, or perform actions beyond their permissions through deferred GitHub agentic workflows. Use strict security settings based on this threat model, adhering to four security principles:  

  1. Defense In-Depth  
  1. Not Trusting Agents With Secrets  
  1. Reviewing all writes  
  1. Comprehensive Logging.  

Defend in Depth 

GitHub Agentic workflows use a layered security system with state configuration and planning layers. Each layer helps limit the impact of failures in the layers above by enforcing its own security rules.  

The Substrate Layer is built on a GitHub Access Runner, running on a virtual machine, with several trusted containers that control which resources an agent can use. This layer keeps components separate, manages privileged operations and system calls, and enforces communication boundaries at the kernel level. These predictions remain valid even if an untrusted component is compromised and runs code within its container.  

On top of the substrate layer is the configuration layer. This layer uses declarative artifacts and toolchains to set up a secure system and its connections. It decides which components are loaded, how they connect, which communication channels are allowed, and what privileges each has. External tokens, such as agent API keys and GitHub access tokens, are important inputs. The configuration controls which tokens are placed in which containers.  

The last layer of defense is the planning layer. While the configuration layer decides which components exist and how they connect, it does not control when they are active. The planning layer’s main job is to set up a staged workflow with clear data exchanges between components. The Safe Outputs subsystem, explained later, is the main example of secure planning.  

Don’t Trust Agents Bearing Secrets 

From the start, we aimed for workflow agents to have no access to secrets and to maintain strict trust boundaries. Agentic workflows run as GitHub actions, with all components sharing a single trust domain on the runner VM. In this setup, sensitive items such as agent authentication tokens and MCP server API keys are stored in environment variables and configuration files that all processes in the VM can access. No extra measures are required to prevent agents from breaching these trust boundaries.  

This is risky because agents can fall victim to prompt injection. Attackers might cause harmful impacts, such as web page or repository issues, that trick agents into revealing sensitive information. For example, an agent affected by prompt injection and with access to shell commands could read configuration files, SSH keys, LNS/PROC state, and workflow logs to find credentials and other secrets. It could then upload these secrets online or hide them in public GitHub objects, such as issues, pull requests, and comments.  

Our first step to reduce risk was to put the agent in its own container and to implement strict controls on what it can access. This includes:  

  • Firewall internet access  
  • MCP access only through a trusted gateway  
  • NLM API calls are routed through an API proxy to limit internet access  

Agentic workflows set up a private network between the agent and the firewall. The MCP gateway runs in a separate trusted container, starts MCP servers, and is the only one with access to MCP authentication material.  

Agents like Cloud, Codex, and Copilot need to talk to an LLM over a secure channel, but we do not give these tokens directly to the agent’s container. Instead, we keep LLM auth tokens in a separate API proxy and set up agents to send modern traffic through that proxy.  

Zero-Secret Agents need a balance between security and usefulness. Programming tasks often need access to compilers/interpreters/scripts/repository data. However, increasing the container setup would duplicate existing provisioning steps and add more network destinations to the five-where rules.  

Instead, we use container volume mounts to give the agent access to needed host files and programs, and we run it in a chroot jail. First, we mount the whole VM. The host system has a read-only /host. Then we cover certain paths with empty tmpfs layers and start the agent in a chroot jail at /host. This way, the host setup stays unchanged, and the agent can only read and write what it needs for its work.  

Stage and Vet all Writes 

Even without access to secrets, prompt-injected agents can still cause problems. For example, an agent’s interest might flood a repository with unnecessary issues or pull requests to overwhelm maintenance, or add unwanted URLs and other content to repository objects.  

To prevent this kind of behavior, the Agentic Workflows Compiler decomposes every workflow into clear, explicit stages. It acts as a control point, defining for each stage.  

  • The Active Components and Permissions (read vs write)  
  • The data artifacts emitted by that stage  
  • The admissible downstream consumers of those artifacts  

While the agent runs, it can read GitHub state through the GitHub MCP server and can only prepare its updates through the safe outputs MCP server. After the agent finishes the safe outputs, the MCP server processes any buffered write operations using a set of safe output checks. It includes operations that an agent can perform. Authors can choose which GitHub update types are available, such as:  

  1. Creating issues, comments, or pull requests  
  1. Safe outputs limit the number of updates allowed, such as restricting an agent to creating at most three pull requests per run.  
  1. Safe outputs analyze and update content to remove unwanted patterns, such as sanitizing URLs  

Only artifacts that pass through the entire safe outputs pipeline can be passed on, making sure that each stage’s side effects are explicit and vetted.  

Log Everything. 

Even with no secrets and checked rights, an agent can still change repository data, use tools in ways we did not expect, or try to get around the limits we set. Agents will try many tricks to complete their tasks. If something goes wrong, we need to see the full execution path to understand what happened.  

Agentic workflows make observability a first-class property of the architecture by logging extensively at each trust boundary. Network and destination-level activity is recorded at the five one-layer model request/response metadata, and authenticated requests are captured by the API proxy. All invocations are logged by the MCP gateway and MCP servers. We also have an internal implementation in the agent container to audit potentially sensitive actions such as access to environment variables. Together, these logs support end-to-end forensic reconstruction, policy validation, and rapid detection of anomalous agent behavior.  

Extensive logging also sets the stage for future information flow controls. Anyway, we observe communication; we can control it. Agentic workflows already support GitHub MCP servers’ lockdown mode. In the coming months, we will add more safety controls that enforce policies across MCP servers based on whether something is public or private and who created a repository object.  

What’s Next? 

Join the discussion in our community or on the #GitHubNext Discord. We look forward to seeing what you build with GitHub Agentic Workflows. Stay tuned for more updates.

Source: Under the hood: Security architecture of GitHub Agentic Workflows 

Tesla has advanced general-purpose robotics. The latest Optimus AI update adds vision-language navigation, enabling the robot to reason. By merging language understanding with spatial cognition, Tesla addresses the main challenge of deploying humanoid robots: executing complex real-world instructions.  

For robotics engineers and AI researchers, this update marks a shift from traditional SLAM methods to a more comprehensive, embodied AI approach for humanoid robots. Now the focus is not just on avoiding obstacles but on helping the robot comprehend its environment using human language.  

The shift to vision-language navigation (VLN) for robotics engineers and AI researchers. This update marks a shift from traditional SLAM methods to a more comprehensive embodied AI approach for humanoid robots. Now the focus is not just on avoiding obstacles but on helping the robot comprehend its environment using human language.  

The Shift to Vision-Language Navigation (VLN) 

Historically, autonomous navigation was a geometric problem. Robots use LiDAR-based vision to create a voxel map of the world and navigate to specific coordinates. However, coordinates are not how humans communicate. We do not tell a co-worker to move to 45.2-12.8 in. We say, “Take the red folder from the messy desk and bring it to the lounge near the coffee machine.”  

With Vision-Language Navigation (VLN), Optimus can now understand these kinds of instructions. The new AI uses a transformer model (a type of neural network, especially good at understanding language and images) that processes video from the robot’s eight cameras along with language input. This lets the robot find objects or rooms it hasn’t seen before by matching what it sees to the words it hears.  

Embodied AI: The Fusion of Logic and Limbs 

This update focuses on embodied AI, meaning the robot’s intelligence is integrated with its physical form. Unlike pure text-based models, a humanoid robot must interact with the physical world and obey its laws. Tesla has redesigned its FSD for robots to enable detailed step-by-step reasoning about space and time, allowing Optimus to plan and act within its environment.  

When Optimus receives a command, the vision language model first breaks the task into sub-goals. If the goal is to clean up the spill in the lab, the robot must identify it using its vision system. Understand that cleanup requires a tool, such as a mop or paper towels. Use language/logic) and then navigate to where those items are typically stored (using memory and spatial reasoning, or the ability to recall and understand places). By running this logic locally on Tesla’s D1 chip, the robot achieves sub-millisecond latency to adjust its balance and gait while simultaneously processing high-level cognitive tasks.  

Mastering Active Environments with World Models 

A major challenge for humanoid robots is that human spaces change constantly. Factories, homes, and offices are never static. The new AI stack uses Neural World Models to help Optimus predict possible changes based on past data.  

If a human walks across the robot’s path, Optimus does not simply stop. It predicts the person’s path and adjusts their velocity and path in real time. This is where the vision-language component becomes critical for safety and social etiquette. The robot can distinguish between a stationary object, such as a box, and a temporary obstruction, such as a person, and chooses a wider berth to ensure people’s comfort. This subtle behavior is a direct result of training the navigation stack on millions of hours of human-human interaction data, allowing the robot to emulate natural spatial social norms.  

The Role of End-to-End Neural Networks 

Tesla is committed to an end-to-end approach. While others use separate modules for vision, planning, and movement, Optimus depends on a single large neural network. The Vision Language Navigation update feeds raw data, images, and text directly into this network, which controls the robot’s actions.  

This approach provides for emergent problem-solving. During recent internal testing, an Optimus unit was tasked with moving a crate that was blocked by a rolling chair. Rather than failing or waiting for the path to clear, the robot used its vision-language understanding to recognize the chair as a movable object, pushed it out of the way, and proceeded to its goal. This type of reasoning, identifying affordances in the environment and seeing what actions an object allows (such as a chair’s mobility), is the hallmark of true humanoid autonomy.  

Scaling Through The Dojo Training Fabric 

Tesla’s Dojo supercomputer drives Optimus’s advanced embodied AI. To train vision-language navigation, Tesla uses a special auto-labeling system. Thousands of Optimus robots in factories collect data. When a robot encounters a new situation or tricky instructions, it sends the data to Dojo.  

There is a larger teacher model that analyzes the Dojo video. A bigger teacher model reviews the video and results, then labels the data for the student model on the robot. This cycle makes the navigation system stronger every day. In 2026, Tesla began using generative world simulations in which Dojo creates millions of challenging scenarios, such as a robot in a dark room with mirrors or a busy hospital hallway, to test the VLN system before it’s used in real robots. The technical ability to move forward with vision-language navigation is an economic strategy that makes the robot easier to perform via voice or text.  

Tesla is reducing the barrier to entry for small-scale manufacturing and elder care facilities. You no longer need a staff of robotics engineers to define waypoints or no-go zones. A floor manager can simply walk the robot through a facility, giving verbal indications such as “this is the shipping dock” and “don’t enter this area during shift changes”. The robot’s VLN stack will build a semantic map that adheres to those rules.  

Tesla believes accessible robotics will help Optimus reach millions of users. When using a robot is as simple as conversation, it becomes an everyday workplace tool, not a luxury.  

The Road Ahead: General Purpose Intelligence 

Adding Vision Language Navigation to the Optimus AI stack is a step toward Tesla’s goal of Artificial General Intelligence. While a chatbot explains a recipe, Optimus is getting closer to seeing the ingredients, understanding the recipe, and completing the task.  

Looking to 2026, integrating vision and language will drive social robotics. Optimus will move through our world and communicate, saying things like, “Excuse me, I need to reach that shelf,” or, “I have completed the inventory check.” This will ease collaboration between people and robots.  

Final Thoughts: The Humanoid Constitution 

Tesla’s vision for Optimus has always been bold, but Tesla’s big plans for Optimus are now becoming real with the latest AI update, which adds vision-language navigation. Tesla is reaching, teaching the robots to see and hear the world as we do. This marks the start of the Autonomous Digital Coworker, a machine that understands not just what to do, but also how and why. The general-purpose humanoid is no longer simply an idea; it’s already working on factory floors.

SourceAI & Robotics 

The global manufacturing sector is navigating a seismic shift in which yesterday’s static automation can no longer keep pace with the race for competitiveness. As production lines demand ever greater adaptability and precision, the backbone of industrial robotics must transform. NVIDIA has risen to this challenge with its latest breakthrough: the ISAAC SDK update, which brings on-device reinforcement learning to factory robots. This leap propels industrial AI out of the data center and onto the factory floor, right at the edge.  

For robotics engineers and facility managers, this update marks the end of the train-and-deploy era. Traditionally, reinforcement learning required massive external compute clusters. These clusters simulated millions of iterations before any code was deployed to a physical robot. Now, these processes run locally on N-media, Jetson, Thor, and O-Ren modules. N-media enables a new generation of self-driving industrial machines capable of real-time self-optimization.  

The Technical Evolution of Isaac SDK 

The Isaac SDK (software development kit) tools and resources to develop software applications have long been the backbone of NVIDIA’s robotics ecosystem, providing the library’s drivers and APIs (application programming interfaces), software bridges that let programs communicate, and are necessary to bridge the gap between virtual simulation and tangible reality. However, previous iterations relied heavily on the same-to-real pipeline. Developers would use NVIDIA ISAAC Gym (a simulation tool for training robots) to train a policy (a set of rules or behaviors) in a high-fidelity virtual environment and then export that frozen model to the robot.  

With this update, the SDK releases a native on-device learning (ODL) framework that enables robots to continue learning long after deployment. If a factory robot meets an unexpected variable, be it shifting lighting, a novel component texture, or subtle changes in resistance, it no longer waits for a developer to step in. Instead, it taps into reinforcement learning for grasping and navigation, fine-tuning its motor control on the fly so as to keep production humming no matter how unpredictable the environment becomes.  

Breaking The Connectivity Bottleneck 

Latency has always been a challenge for advanced AI in heavy industry. Robotics often sends sensor data to distant cloud servers and then waits for updated instructions. Even fast 5G cannot prevent costly delays, which can lead to errors or safety issues. By embedding reinforcement learning directly onto the device, N-media eliminates the need for high-bandwidth connections. Robotics can now update models independently.  

This local-first approach revolutionizes multi-agent coordination in smart factories. Imagine dozens of self-governing mobile robots navigating a busy floor. Each robot learns and anticipates its peers’ moves in real time. The Isaac SDK update gives these robots shared memory and peer-to-peer communication. Devices synchronize learning and build collective intelligence as the fleet grows.  

The Mechanics of On-Device Reinforcement Learning 

The Morpheus update to the Isaac SDK delivers a specialized compute kernel. This core program manages specific hardware functions. It splits the Jetson module’s GPU resources into two dedicated lanes. One lane powers real-time inference the doing. The other runs reinforcement learning in the background the learning.  

This dual-pathway design ensures the robot’s main job never gets sidetracked by learning. Using online policy gradient optimization, the robot tweaks its behavior in careful, incremental steps. If a new mode exceeds safety limits, the Isaac SDK’s built-in safety monitor steps in. It overrides risky actions and shields both the robot and its environment during experimental phases.  

Learning For Robotic Grasping 

Perhaps one of the most immediate uses for this technology is in pick-and-place operations. Today’s e-commerce and pharmaceutical lines demand robots that can handle thousands of unique objects, some fragile, some translucent, many oddly shaped. Static algorithms struggle in the face of such endless variety.  

Element Learning for Robotic Grasping 

A robot equipped with the new Isaac SDK can adjust its grip, pressure, and approach angle based on tactile feedback and computer vision. If a grip fails, the robot analyzes the sensor data, updates its local policy, and attempts a different strategy on the next cycle. This level of granular autonomous refinement will eventually lead to the dark factory vision, where human participation is required only for high-level tactical oversight rather than mechanical troubleshooting.  

Integration with Omniverse and Digital Twins 

While this update focuses on on-device execution, cloud integration still plays a role. Robots that develop more efficient movements can transmit their advancements to digital twins via N-media omnivores, creating a feedback loop between real and virtual operations.  

That data is validated in a rapid-fire simulation before being shared with every robot in the fleet. This sparks a global optimization cycle: robots solve local challenges, and their solutions are tested and spread worldwide. For manufacturers with plants across countries, a robot in Texas can learn from a breakthrough in Germany within hours.  

Security and Governance for Autonomous Machines 

Enabling autonomous machine updates brings safety and oversight considerations to the forefront. NVIDIA addresses these by aligning the Isaac SDK update with Holoscan and advanced security standards.  

Every behavioral update generated through on-device reinforcement learning is logged with a cryptographic signature. This allows facility managers to perform a post-mortem audit if a robot behaves unexpectedly. Furthermore, the SDK supports policy sandboxing, allowing a robot to test a new learned behavior in a virtualized sub-process before sending actual voltage to its physical actuators.  

The Economic Impact: Reducing the Total Cost of Ownership 

On-device reinforcement learning delivers financial advantages by lowering the total cost of ownership for industrial robotics. Reduced dependence on ongoing human oversight makes robots long-term, self-improving assets, maximizing return on investment.  

Gazing ahead to the rest of 2026, adopting the new NVIDIA Isaac SDK is likely to become essential for any facility changing to Industry 5.0 status. By blending local AI, hardware-enforced safety, and global simulation, manufacturers can build a resilient ecosystem that withstands the shocks of today’s supply chains.  

Conclusion: The New Standard For Factory Intelligence 

The latest Isaac SDK update is more than a feature addition it is a shift in how industrial machines learn. By freeing robots from cloud dependencies, they now learn directly from hands-on experience, moving autonomous manufacturing another step forward.  

For today’s engineers, the mission shifts from programming individual robots to orchestrating entire ecosystems of learning. The machines are ready to evolve. Our role is to create an environment where evolution can prosper.

Source: NVIDIA Isaac 

Windows auto-patch will enable hot-patch security updates by default to accelerate device security. This change in default behavior will apply to all eligible devices managed through Microsoft Intune and those accessing the service via Microsoft Graph API, beginning with the May 2026 Windows Security Update. Applying security fixes without requiring a restart enables organizations to achieve 90% compliance in half the usual time while continuing administrative control.  

Starting April 1, 2026, organizations not ready for default hot-patch updates will have new administrative controls. The next sections explain the reason for this update and how to choose the best approach.  

Advantages Of Hotpatch Updates 

Monthly, Windows releases security updates to address known CVEs and mitigate risk. Traditionally, IT administrators waited several days for device restarts before updates became effective, and compliance targets were met. Standard practice is to allow a three to five-day window for post-installation before enforcing a restart. With hotpatching, updates are deployed and activated immediately without requiring a restart, increasing security efficiency.  

Devices are patched significantly faster with hotpatching because updates do not require device restarts. For instance, four organizations managing 30,000–70,000 endpoints each achieved 90% patch compliance in half the time compared to traditional approaches, without modifying update policies.  

Currently, over 10 million production devices are enrolled in hot-patch updates, demonstrating broad adoption and organizational trust in this function. Additional information is available on the efficiency of smaller hot-patch update sizes and on Microsoft’s internal implementation of hot-patch updates.  

Hotpatch By Default: Operational Overview.  

In May 2026, Windows Auto Patch will make hot patch updates the default to accelerate security for organizations using Intune or Microsoft Graph API. All patch policies in Intune are managed by Windows AutoPatch. The default setting applies only to devices not in a quality update policy. For devices assigned to a quality update policy, the specified hot patch setting is enforced. Preferences for Update Deference and Update Ring are maintained.  

Timeline For Receiving Hotpatch Updates 

Devices that meet the prerequisites and have installed the April 2026 Security Update will start receiving hot patch updates from May 2026. Check enrollment status using the new Windows Auto Patch Readiness Tools.  

How Do I Know If A Device Will Receive A Hot Patch Update 

Prior to the May 2026 hot patch update, review the Hot Patch Quality Updates report in Intune. This report identifies devices with Hot Patch Updates enabled that also satisfy the necessary prerequisites. The HotPatch Ready column indicates which devices will receive a hotpatch update, while the Hot Patched column lists devices that have been successfully patched.  

The Quality Update Status Report in Intune can also be used to determine which devices are prepared to receive a HotPatch update. The HotPatch Readiness column indicates whether a device satisfies the prerequisites for HotPatch updates. An additional column, “HotPatch enabled”, will be added to display each device’s status.  

Adopting Hotpatch Updates At An Individualized Pace 

Windows Autopatch is enabling hotpatching by default because hotpatch updates are the quickest way to get secure. Hotpatching is the process of applying updates without restarting devices. As such, we recommend keeping HotPatch updates enabled for your devices. If you are not ready for this change, you can opt out of groups of devices or the whole tenant (your organization’s account or environment in Microsoft’s cloud services).  

The tenant-level HotPatch update setting becomes available on April 1, aligning with the baseline month. IT teams have until May 11, 2026, to make configuration adjustments before automatic deployment begins.  

Opting Out Of Hotpatch Updates At The Tenant Level 

When changes take effect in April, follow these operational steps to configure a tenant-wide opt-out for HotPatch updates.  

  1. Navigate to Tenant Administration, Windows Autopatch, Tenant Management.  
  1. Select the tenant settings tab.  
  1. Toggle the “When available, apply patches without restarting the device (HotPatch)” setting to either allow or block.  

How to Opt Out of HotPatch Updates for Groups of Devices 

To define a custom update approach for a device group, assign devices to a quality update policy. Windows Autopatch enforces policy-level configuration above the tenant default. To create a policy, follow these procedural steps.  

  1. Open Microsoft Entune.  
  1. Navigate to Devices > Manage Updates > Windows Updates.  
  1. Select the Quality Updates tab.  
  1. Select Create.  
  1. Select the Windows Policy update policy from the drop-down menu.  
  1. Fill in the title and details on the Basics tab, then select Next.  
  1. In the settings step, toggle the “When available, apply without restarting the device (HotPatch)” setting to either allow or block, then select next.  
  1. Apply any scope tags, then select next.  
  1. Assign the Microsoft intra groups you want, then select Next.  
  1. Select create.  

You can disable HotPatch Updates at the tenant level and enable them for specific devices and vice versa when you are ready for HotPatch Updates. By default, just toggle when available, apply without restarting the device (HotPatch), and allow.  

To use HotPatch updates, enabled by default, ensure that all devices meet the required prerequisites. For additional information and an implementation guide, refer to the HotPatch updates documentation and the Windows Autopatch Frequently Asked Questions (FAQ).

Source: Securing devices faster with hotpatch updates on by default 

AWS has brought out major updates to the Nitro system, making it even more secure for AI inference. With improved enclave-level isolation, organizations can process sensitive data knowing that neither AWS operators, root users, nor administrators can access it while it is in use. Enable businesses to comply more easily with regulatory requirements and build customer trust when handling confidential information. These changes are sometimes referred to as the Nitro Isolation Engine or Advanced Nitro Enclaves.  

Key Aspects of the AWS Nitro Update for Sovereign AI 

  • Enclave-level isolation: Nitro Enclaves allows for the creation of isolated, hardened, and highly constrained virtual machines within Amazon EC2 instances. This isolation covers both CPU and memory, ensuring that even if the parent instance is compromised, the data within the enclave remains protected.  
  • Sovereign AI Inference: With this update, you can run machine learning inference on sensitive data inside these secure environments. This is especially important in fields like finance, healthcare, and government, where strict data privacy is required when using large language models.  
  • Cryptographic Attestation: Nitro Enclaves only lets approved code access keys to data, helping prevent tampering.  
  • Integration with Accelerators: Nitro Enclaves provides secure connectivity to accelerators, such as NVIDIA Blackwell GPUs and AWS Trainium 2, while maintaining the encryption of AI workloads.  
  • The system blocks all applications, OS, or users in the parent instance from accessing enclave data.  

These updates help organizations meet digital sovereignty requirements. They allow companies to control their data models and manage keys, even when using the public cloud, often through ‘hold your own key’ (HYOK) models.  

Creative AI is changing how businesses interact with customers worldwide. Many organizations are now using large language models (LMS) and other base models (FM’s) to enhance customer experiences, streamline operations, boost employee productivity, and open new revenue streams.  

Core models and their applications are major investments for customers. They often work with sensitive business data to improve results. Customers’ main concern is protecting this sensitive information and their investments. Both the data and model weights are valuable and require strong protection from administrators, users, vulnerabilities, and cloud providers.  

At AWS, our top priority is protecting the security and confidentiality of our customers’ workloads. Security in generative AI is integrated across three distinct layers of our AI stack: the infrastructure layer for building and training models; the model and tooling layer for deploying and scaling AI; and the application layer, where AI-generated content is used in practice.  

  • The bottom layer is the infrastructure layer, which provides the tools and resources needed to build and train large language models (LLMs) and other foundation models (FMs).  
  • The middle layer provides access to models and tools for building and scaling generative AI applications.  
  • The top layer includes applications that use LLNs and other FNs to make work stress-free by:  
  • writing and troubleshooting code  
  • generating content  
  • deriving insights  
  • and taking action  

Each layer is important to make generative AI pervasive and revolutionary.  

The AWS Nitro system is a unique innovation we created for our customers. It functions as the core computing backbone for AWS, concentrating on both security and performance. Its specialized hardware and firmware are built to ensure that no one, not even AWS staff, can access your workloads or data on Amazon EC2 instances. Since 2017, customers using Metro-based EC2 instances have benefited from this level of confidentiality and isolation from AWS operators. No employee can access a Nitro EC2 instance that customers use to run their workloads or to access data that customers send to a Machine Learning (ML) accelerator or GPU. This protection applies to all Nitro-based instances, including those with ML accelerators such as AWS Inferentia and AWS Trainium, as well as those with GPUs such as P4/P5/G5/G6.  

The Nitro system powers the Elastic Fabric Adapter (EFA), which uses AWS’s scalable, reliable datagram (SRD) protocol for large-scale distributed training in the cloud. This combination creates an always-encrypted RDMA-capable network, ensuring that all communication through EFA is protected by VPC encryption without impacting performance. This thereby maintains the security and speed of your generative AI workloads.  

NITRO’s design has been validated by NCC Group, an independent firm. AWS delivers strong protection for customer workloads and has defined this level of security in our service terms for added customer assurance.  

Innovating Secure Generative AI Workloads Using AWS’s Industry-Leading Security Capabilities 

Since the beginning, AWS AI infrastructure and services have included security and privacy features to help you control your data. As more customers adopt generative AI, it’s important to know your data is safe throughout the AI lifecycle, from data protection to training and inferencing. Protecting Model Weights: The parameters a model learns during training are essential for keeping your data safe and preserving the model’s integrity.  

This is why it is critical for AWS to continue innovating on behalf of our customers to raise the bar on security across every layer of the generative AI stack. Security and confidentiality must be built into each layer. You need to secure the infrastructure to train LLNs and other FN’s. Use secure tools to run them and operate applications with built-in security and privacy you can trust.  

At AWS, securing AI infrastructure involves preventing unauthorized access to sensitive AI data, including model weights and processed data, by both infrastructure operators and customers. This approach comprises three key principles.  

  1. Complete isolation of AI data from the instructor/operator: The operator must not be able to access customer content or AI data, including model weights and processed data.  
  1. The ability for customers to isolate AI data from their own users: The infrastructure should allow model weights and data to be loaded onto hardware while remaining isolated and inaccessible to the customers’ own users and software.  
  1. Protected infrastructure communications: communication between devices in the ML accelerator infrastructure must be secure, with all external links encrypted.  

The Nitro system fulfills the first principle of secure AI infrastructure by isolating your AI data from AWS operators. The Second principle provides you with a way to remove administrative access to your AI data from your own users and software. AWS not only offers you a way to achieve that, but we have made it. We also made it simple and practical by investing in building an integrated solution between AWS Nitro Enclaves and AWS Key Management Service (AWS KMS). 

Nitro NCLSS and AWS KMS: You can encrypt your sensitive AI data using keys that you own and control. Store that data in a location of your choice and securely transfer the encrypted data to an isolated compute environment for inferencing. Throughout this entire process, the sensitive AI data is encrypted and isolated from your users and software on your EC2 instance, and AWS operators cannot access it. Those cases that have benefited from this flow include running LLM inference in an enclave. Until today, Nitro NCLSS has operated only on the CPU, limiting the potential for larger generative AI models and more complex processing.  

We plan to expand Nitro encryption to ML accelerators and GPUs, meeting the third principle. This lets you decrypt and process AI data in ML accelerators as submitted from both operators and users. With AWS KMS, data is decrypted only after cryptographic checks. This upgrade enables end-to-end encryption for generative AI workloads. We plan to offer this end-to-end encrypted workflow in the upcoming AWS Trainium 2 and in GPU instances based on NVIDIA’s new Blackwell architecture. Both will provide protected communication between devices, meeting the third principle of secure AI infrastructure. AWS and NVIDIA are working together to deliver a joint solution that combines NVIDIA’s Blackwell GPU platform and GB200/NVL72 with the Nitro system and EFA technologies.  

This will help customers securely build and deploy next-generation generative AI applications. Thousands of customers are using AWS to experiment and move transformative generative AI applications into production. Generative AI workloads contain highly valuable and sensitive data that needs the level of protection from your own operators and the cloud service provider. Customers using AWS Nitro-based EC2 instances have received this level of protection and isolation from AWS operators since 2017, when we launched our innovative Nitro system.  

At AWS, we keep innovating by building fast and accessible tools that make it easier for you to secure your generative AI workloads across all three layers of the stack. This way, you can focus on what you do best while expanding the use of generative AI in your business.

Source: A secure approach to generative AI with AWS 

Mobile technology is changing fast as the line between smartphones and wearables blurs. In early 2026, Google introduced the Android 17 secure companion API to unify wearable security. This update is more than a minor change for developers and manufacturers. it sets a new standard for biometric authentication in the Android ecosystem.  

The Problem of Peripheral Trust 

Until recently, connections between an Android device and a wearable such as a smart ring, augmented reality (AR) glasses, or a fitness tracker used loose protocols. Bluetooth, a short-range wireless technology, and Ultra Wide Band (UWB), a technology for accurate device positioning, provided the connection. Trust was managed by the wearable itself, sometimes poorly. The lack of consistency led to security gaps, especially as wearables began handling tasks such as payments, door unlocking, and accessing health records independently.  

The Android 17 Secure Companion API shifts the core trust point to the phone’s secure hardware. With a unified handshake, biometric checks on wearables are as secure as those on the phone itself, closing security gaps. AI wearables are no longer the weak link in digital security.  

Technical Architecture of the Secure Companion API 

The API uses Android 17 Strongbox, a secure hardware module for storing cryptographic keys, for remote biometric checks. The wearable sends an encrypted (encoded for security), salted (a random value added for greater security) biometric hash (a digital fingerprint of biometric data) to your phone.  

The host device, usually your phone, then performs the verification within its Trusted Execution Environment (TEE), which is a secure area of the main processor. If the signatures match, the host issues a short-lived trust token (a temporary digital credential) to the wearable, authorizing specific actions for a set duration. The architecture ensures that sensitive biometric templates are never permanently stored on the wearable itself, which is often more susceptible to physical tampering or theft than a smartphone.  

The API also adds a feature to maintain identity continuity. You stay logged in as long as your wearable is near your phone and in contact with your skin. If you remove a smart ring, the API cancels all trust tokens immediately. You need to re-authenticate with your biometrics.  

Standardized Biometrics for a Multimodal World. 

What makes the Android 17 secure companion API stand out is its flexibility. By 2026, biometrics will be more than just fingerprints. Now, things like how you walk or the rhythm of your voice can also be used to identify you.  

By providing a standardized interface, Google is enabling specialized hardware manufacturers to plug into Android’s security stack without having to write their own, often buggy, middleware. Whether a developer is working with a high-end medical-grade sensor or a consumer-grade gesture controller, the Secure Companion API provides a consistent set of calls to request authentication, check trust status, and handle secure key exchanges. This level of standardization is accepted to accelerate the adoption of invisible security, in which devices recognize who you are based on how you interact with them.  

Supporting The Next Generation Of AI Wearables 

The API release aligns with the AI wearable boom of late 2025, which brought more personal agents. These need access to emails, financial, and security systems. Without a standard for identity, agents remain limited to simple tasks.  

Thanks to the Android 17 Secure Companion API, AI-powered devices can now handle important tasks. For example, AR glasses could approve a wire transfer after checking your retinal scan, or a smartwatch could assign a legal document using your heart rate gun verified by your phone. This is the usefulness of a secure, standardized API.  

Privacy and the Zero-Knowledge Framework 

Privacy advocates have long raised concerns about the centralization of biometric data. In response, Google has implemented a zero-knowledge proof (ZKP) system in the secure companion API. This ensures that when your wearable and phone communicate, they verify your identity without exchanging raw biometric data. By keeping users’ biological data private and secure, Google aims to build trust, the trust necessary for the long-term success of AI wearables.  

Implementation and Developer Adoption 

For developers migrating to the secure companion API, it is designed to be relatively painless. The API works with Jetpack Compose for Wear and provides simple tools for managing complex cryptographic steps. Companies have reported significant reductions in development time. Offloading security logic to the Android OS allows companies to focus on core products, such as better health tracking, more immersive augmented reality, or more responsive AI agents. The API includes a compatibility layer for older hardware, so devices built in 2024 and 2025 can gain some of the security benefits of Android 17 through software-emulated trust zones.  

The Road Ahead: Toward a Passwordless Future 

The release of the Android 17 Secure Companion API is a big move toward a passwordless future. As our devices get smarter and more personal, we won’t need to rely on passwords anymore.  

In the next few years, the Secure Companion API is expected to support multi-device orchestration. You could log in once on your watch. That trust would extend to your tablet, laptop, and smart card. All would be managed by your Android 17 phone.  

Conclusion: A New Standard For Digital Intimacy 

By standardizing biometric authentication for AI wearables, Google is setting a clear standard for the wearable AI era, recognizing that security must be strong, unified, and privacy-focused.  

For developers, security researchers, and tech fans, the message is clear: old-isolated security models are gone. Now, there is a unified, hardware-backed, privacy-focused standard that will shape mobile technology for years to come. Android 17 is far more than an update. It sets the rules for the new era of wearable AI.

Source:  Android XR Bulletin—March 2026 

Apple Metal 4 now provides native low-latency neural scheduling for Mac GPUs, enabling developers to integrate AI features more efficiently.  

Developers can now run Core ML models directly on the GPU alongside graphics and compute workloads. Notable features include:  

  • MTL4Macmachinelearning  
  • command encoder  
  • integrated tensor support  
  • improved synchronization  

These enhancements allow AI workloads to operate independently of the CPU.  

Here are some important updates in Metal 4 for AI and Graphics 

  • Native Neural Scheduling with MTL4 Machine Learning Command Encoder Machine Learning Inference can be integrated into the rendering pipeline.  
  • Tensor support: The API and Metal Shading Language now have built-in Tensor support, making machine learning workflows smoother.  
  • Optimized workflow: machine learning compute and rendering instructions can now be combined, reducing requirements for CPU-side synchronization.  
  • Game Porting Toolkit 3.0: This version adds experimental Metal IFX integration and supports more instruction sets.  
  • Game Porting Toolkit 3.0: This version adds experimental Metal IFX integration and supports more instruction sets.  

In summary, Metal 4 is designed for Apple silicon Macs and delivers advanced AI-powered graphics.  

With Metal 4, you can now run Core ML models efficiently as part of your Metal workflow. This helps when your app needs to use model output in a Metal context, for example, rendering a scene or running a compute task. To add machine learning inference, convert your Core ML model into a MetalML package during development. Then use that package in a machine learning encoder at runtime.  

Your app can handle rendering, compute, and machine learning tasks in a single command buffer, with no CPU wait or extra sync. When you run Core ML models on the GPU, your app sends inputs from a compute pass and quickly uses outputs from a machine learning pass.  

Metal 4 adds new tensor types, which are multi-dimensional arrays used for machine learning model data. The Digital Sharing language now includes tensor operators and features such as cooperative tensors. This lets your shader code work with tensor data in parallel at any GPU stage.  

Discover Metal 4 

Metal 4 is designed to fulfill the demands of today’s applications. Its simpler API helps you get the best performance on Apple Silicon with less overhead and enhanced resource management. Compilation is now clearer and faster, with new options to reduce runtime compilation.  

Metal 4 now fully supports machine learning, including native tensor support in the API and the Shading Language. You can add machine learning to your Metal application by running large networks from the command line or by adding inference operations immediately into your shader code.  

Metal 4 additionally extends the MTL device that you already use today. You can incrementally adopt features that will most help your app or game in the order you need them.  

Metal 4 Games 

Create modern games that work well on all Apple devices. Metal 4 helps you handle large sets of resources more efficiently. With new sparse resource placement, you can better utilize system memory. Also, use familiar APIs to quickly port your games from other platforms to Metal.  

The latest Metal compiler updates give you fine-grained control over shader compilation. Use a dedicated context to adjust the quality of service and ensure optimal performance. Quickly compile pipelines ahead of time to reduce run-time delays and share compilation results across render pipelines using a common Metal IR for faster development and testing.  

Go Further With MetalIFX And Ray Tracing 

Benefit from improved upscaling with integrated denoising for higher quality, sharper renders at high resolutions. Achieve higher frame rates using new frame interpolation, resulting in smoother animation without sacrificing visual quality.  

Ray tracing now uses intersection function buffers for more flexible indexing, simplifying porting ray tracing code to Metal. Flags for acceleration structure builds let you choose between faster intersection computation or smaller memory footprints, optimizing performance for your specific needs.  

Metal 4 Machine Learning 

You can now add machine learning to your metal application. Tensors are a native resource type for working with data. You can encode machine learning commands into the same Metal command buffers and use the same barriers to sync work on large networks. You can also embed inference directly into your shaders with Metal Performance Primitives, which are optimized for all Apple platforms.  

Game Porting Toolkit 3 

Evaluate even more gains with an expanded instruction set, sparse resources, and experimental support for Metal IFX upscaling, denoising, and frame interpolation. Build and debug your app remotely from Microsoft Visual Studio. Supercharge your HLSL shaders with access to Apple GPU features such as:  

  • frame buffer fetch function constants  
  • intersection function buffers  

using the Metal shader converter, and for access to all Metal APIs from C++ using Metal CPP.

Source: What’s new in Metal 

Reports from late 2025 and early 2026 show that Samsung is mass-producing the Exynos 2600 with its 2nd-generation 2NM gate-tolerant (GAA) transistor technology. Wells have improved to 50-60%. This progress means the chip can be used in the Galaxy S26 series, helping Samsung rely less on Qualcomm.  

Volume Production & Yield Status 

  • Yield improvements: Samsung’s 2nm (SF2) reportedly started low (around 30% in early 2025) but increased steadily to ~40% by mid-2025 and reached 50-60% by the end of 2025, according to various reports.  
  • Samsung officially began mass production of the Exynos 2600 in the last quarter of 2025. By early 2026, reports showed yields were stable enough for the Exynos 2600 to be used in the Galaxy S26 and S26 Plus models, while the Galaxy S26 Ultra is expected to use Qualcomm’s Snapdragon chips.  
  • Initial production was about 15,000 wafers, and capacity is expected to grow as yields get better.  

New GAA Transistor Technology (SF2) 

The 2NM GAA process is a significant advancement, delivering measurable improvements in both power efficiency and performance compared to Samsung’s previous 3NM GAA process. For example, the 2NM process reduces energy consumption and increases chip operational speed compared to the 3NM variant.  

  • Reports indicate the 2NM GAA process offers approximately a 5% performance improvement and 8% better power efficiency over the 3NM GAA process, making the newer node notably more efficient in real-world applications.  
  • Specifically, the SF2 node provides up to a 12% increase in performance, 25% improvement in power efficiency, and 5% reduction in chip area compared to Samsung’s earlier 3NM nodes (SF3), offering tangible advantages for manufacturers and end users.  
  • Advantage: Samsung is using its early experience with GAA technology, first introduced at 3nm, to stabilize the 2nm more quickly than before. Substack+32600 is designed to power the Galaxy S26 series in select markets (likely Europe and Korea), while the S26 Ultra is expected to continue using Qualcomm’s Snapdragon chips.  
  • Achieving a 50 to 60% yield marks substantial progress for Samsung, but the company remains behind TSMC, which reports 2NM process yields of 60 to 70%. Samsung aims to stay competitive by offering lower prices.  
  • Future outlook: If yields continue to rise, reaching 70% in 2026, Samsung could win back major foundry orders from big clients like Qualcomm.  

Samsung is advancing its semiconductor technology with the Exynos 2600, built using the company’s 2NM gate-all-around (GAA) process. Recent reports state that the trial production run achieved a 30% yield. While this represents a milestone for Samsung, it is still below TSMC’s 2NM yield, which is reported at about 60%.  

Why This Matters For Samsung 

The semiconductor industry is very competitive, and manufacturing efficiency directly affects profitability and cost management. Even though Samsung is still behind TSMC, reaching a 30% yield is a clear improvement, especially after its earlier difficulties with 3nm production. Higher yields mean more working chips per wafer, reducing unit costs and potentially improving profit margins. While 30% is still below the 70% that big customers like Qualcomm and MediaTek expect, it shows that Samsung’s semiconductor division is moving in the right direction, both financially and technically.  

Samsung’s Foundry business has struggled to gain the trust of major clients due to previous yield issues, particularly with its 4nm and 3nm nodes. Now the 2NM process offers the company a chance to rebuild its reputation. If yields continue to improve, Samsung could secure orders for flagship mobile processors, AI chips, and high-performance computing components in the near future.  

Exynos 2600, A Game Changer 

The Exynos 2600, also known as Ulysses, is expected to power Samsung’s next flagship devices and could help the company increase revenue through more premium smartphone sales. Right now, Samsung uses Qualcomm’s Snapdragon 8 Elite in its Galaxy S25 series, so improving its own chip production could lower supplier costs and boost overall profitability by reducing its reliance on outside vendors.  

Samsung has been trying to make its Exynos processors more competitive by improving power efficiency, thermal management, and overall performance. The 2NM GAA process brings major upgrades over the previous generation, offering higher transistor efficiency and greater performance per watt. Samsung’s progress in chip manufacturing could also impact AI, automotive, and data-center technology, not just smartphones. As the need for fast, power-efficient processors grows, producing advanced chips at scale will be key to Samsung’s long-term success.  

Difficulties and Opportunities 

Although reaching a 30% yield is meaningful progress, Samsung still faces significant challenges before it can compete directly with TSMC in large-scale chip production. The company needs to:  

  • Improve Yield Efficiency: Achieving 70% yield is necessary to secure bulk orders from major tech companies.  
  • Strengthening client trust, surmounting past production setbacks, and exhibiting consistency will help attract major customers.  
  • Expand production capacity: ensuring that its facilities are ready for mass production without compromising quality will be vital for scaling up.  
  • Advance GAA Technology: While Sachin is the head of TSMC, it must improve its GAA implementation to fully realize the benefits.  

Despite these challenges, Samsung’s ongoing investment in semiconductor research and infrastructure shows its determination to compete at the top level.  

What’s Next? 

Samsung plans to start mass production of its 2nm SF2 process in the second half of 2025. SF2 offers a 12% performance boost, 25% better power efficiency, and uses 5% less area compared to SF3. These technological upgrades could help Samsung compete for premium contracts, increase sales, and improve its financial position in high-end smartphones and advanced computing sectors.  

If Samsung can raise its yield to match TSMC’s, it could become a strong competitor in the foundry market again and win back market share. The company’s long-term aim is to become a player in semiconductor manufacturing, competing with TSMC and even Intel in the years ahead.  

While Samsung has a long way to go before its 2nm process becomes a commercial success, achieving a 30% yield is a step in the right direction. If the company continues upgrading its technology and manages costs and investments effectively, it may soon rival TSMC and secure major chip manufacturing contracts.  

The semiconductor competition is still ongoing, and with rapid technological progress, Things could start to favor Samsung. Keep watching as the race for semiconductor leadership continues.

Source: Samsung’s Exynos 2600 Hits 30% Yield on 2nm Process–A Step Toward Closing the Gap with TSMC 

Anthropic has updated its agentic development capabilities with the transition from the Claude code SDK to the Claude agent SDK in early 2026, providing native, advanced multi-agent orchestration designed for enterprise workflows. This SDK enables developers to build systems in which a central orchestrator agent delegates complex multi-step tasks to specialized sub-agents.  

Key Features of the Claude Agent SDK and Multitasking Routing  

The updated SDK supports building, deploying, and managing complex agentic workflows.  

  • Native Multi-agent Orchestration: The framework enables the construction of hierarchies in which Claude Opus powers a main agent for delegating tasks to specialized subagents.  
  • Persistent and autonomous: Unlike stateless API calls, the SDK operates a long-running process that can manage complex workflows, maintain conversational state across multiple queries, and execute demands in a persistent shell environment.  
  • Model Context Protocol (MCP) integration: agents can seamlessly connect to external tools, data sources, and Internet databases, facilitating interoperability in enterprise environments.  
  • Agent teams/subagents: the system supports agent teams (experimental as of early 2026), which allow multiple Claude code sessions to collaborate on shared tasks, with a lead session directing the workflow.  
  • Production-Grade Controls: Includes built-in guardrails for granular permissions, allowing listing and working directory isolation.  

Enterprise AI Focus  

The shift to the Claude agent SDK supports the creation of specialized agents for various business domains.  

Specialized roles:  

  • Co-op Development of legal assistance  
  • Financial advisors and customer support bots  
  • Codebase automation: Code agents capable of understanding and editing entire codebases, running bash commands, and managing git work trees  
  • Integration with Enterprise Tools: The SDK supports integration with platforms like GitHub and allows for private plug-in marketplaces to enhance enterprise control.  

Technical Details And Availability  

  • Languages: available for both Python and TypeScript  
  • Migration developers are encouraged to migrate from Claude Code SDK to Cloud Agent SDK.  
  • Commands: new agents. New agent SDK commands allow for defining workflows as code and managing multi-agent systems.  

Cloud now has research capabilities that allow it to search across the web, Google Workspace, and any integrations to accomplish complex tasks.  

The journey of this multi-agent system from prototype to production taught us critical lessons about system architecture, tool design, and prompt engineering. A multi-agent system consists of multiple agents (LLNs) working together in a loop, each using tools autonomously. Our research feature involves an agent that plans a research process based on user queries and then uses tools to create parallel agents that search for information simultaneously. Systems with multiple agents introduce new challenges in agent coordination, evaluation, and reliability. This post breaks down the principles that worked for us. We hope you will find them useful to apply when building your own multi-agent systems.  

Benefits of a Multi-agent System  

Research work involves open-ended problems where it is very difficult to predict the required steps in advance. You can’t hard-code a fixed path for exploring complex topics, as the process is inherently dynamic and path-dependent. When people conduct research, they tend to update their approach as discoveries emerge continuously and follow leads during the investigation.  

This unpredictability makes AI agents particularly well-suited for research tasks. Research demands the flexibility to pivot to or explore tangential connections as the investigation unfolds. The model must operate autonomously for many turns, making decisions about what directions to pursue based on immediate findings. A linear one-shot pipeline cannot handle these tasks.  

The essence of search compression: distilling insights from a vast corpus. Sub-agents facilitate compression by operating in parallel with their own context windows, exploring different aspects of the question simultaneously before condensing the most important tokens for the lead research agent. Each sub-agent also provides separation of concerns, distinct tools, prompts, and exploration trajectories, thereby reducing path dependency and enabling thorough, independent investigations. Once intelligence reaches a threshold, multi-agent systems become a virtual means of scaling performance. For instance, although individual humans have become more intelligent over the last 100,000 years, human societies have become exponentially more capable in the information age because of our collective intelligence and ability to coordinate. Even generally intelligent agents face limits when operating as individuals. Groups of agents can accomplish far more.  

Our internal evaluations show that multi-agent research systems excel, especially for breadth-first queries that involve pursuing multiple independent directions simultaneously. We found that a multi-agent system with Claude Opus 4 as the lead agent and Claude Sonnet 4 sub-agents outperformed single-agent Claude Opus 4 by 90.2% on our internal research eval. For example, when asked to identify all the board members of companies in the information technology S&P 500, the multi-agent system found the correct answers by decomposing the task into sub-agent tasks. In contrast, the single-agent system failed to find the answer through slow sequential searches.  

Meta agent systems work mainly because they have spent enough tokens to solve the problem. In our analysis, three factors explain 95% of the performance variance in the BrowseComp evaluation (which tests the ability of browsing agents to locate hard-to-find information). We found that token usage alone explains 80% of the variance, with the number of toll calls and the model choice as the other two explanatory factors. Finding validates our architecture, which distributes work across agents with separate context windows to increase parallel reasoning capacity. The latest Claude models act as large efficiency multipliers for token use, with upgrading to Claude Sonnet 4 yielding a larger performance gain than doubling the token budget on Claude Sonnet 3.7. Multi-agent architectures effectively scale token usage for tasks that exceed the limits of single agents.  

There is a downside: in practice, these architectures burn through tokens fast. In our data, agents typically use about 4x as many tokens per chat interaction, and multi-agent systems use about 15x as many tokens per chat. For economic viability, a multi-agent system requires tasks whose value is high enough to cover the increased performance cost. Further, some domains that require all agents to share the same context or involve many dependencies between agents are not well-suited to multi-agent systems today. For instance, most coding tasks involve fewer truly parallelizable tasks than research, and LLM agents are not yet great at coordinating and delegating to other agents in real time. We’ve found that multi-agent systems excel at valuable tasks that involve heavy parallelization in handling information beyond single-context windows and interfacing with numerous complex tools.  

Architecture Overview for Research  

Our research system uses a multi-agent architecture with an orchestrator working pattern, where a lead agent coordinates the process while delegating to specialized sub-agents that operate in parallel.  

When a user submits a query, the lead agent analyzes it, develops a strategy, and spawns sub-agents to explore different aspects simultaneously. As shown in the diagram above, the sub-agents act as intelligent filters by iteratively using search tools to gather information (in that case, on AI agent companies in 2025) and then returning a list of companies to the lead agent so it can compile the final answer.  

Traditional approaches to retrieval-augmented generation (RAG) rely on static retrieval. That is, they fetch a set of chunks that are most similar to an input query and use these chunks to generate a response. In contrast, our architecture uses a multi-step search that dynamically finds relevant information, adapts to new findings, and analyzes results to formulate high-quality answers.

Source:How we built our multi-agent research system 

NVIDIA CUDA Deep Neural Network Library (cuDNN) is a GPU toolkit for deep neural networks. The latest update optimizes implementations of key operations, including forward and backward convolutions  

  • attention  
  • matrix multiplication  
  • pooling and normalization  

How cuDNN Works 

  • Accelerated Learning: cuDNN uses computational kernels that leverage tensor cores when appropriate, delivering top performance for compute-bound operations. It also uses heuristics to select the optimal computational kernel for each problem size.  
  • Fusion Support: cuDNN fuses compute-bound and memory-bound operations for common fusion problem patterns. It builds computational kernels at runtime. For specialized patterns, it leverages pre-optimized computational kernels.  
  • Expressive op graph API command users represent computations as graphs of tensor operations. cuDNN provides both a direct C API and an open-source C++ front-end API, with most users starting with the front-end API.  

This Guide Covers Installing And Using The CuDNN Front End And Back End 

The NVIDIA cuDNN Deep Neural Network Library is a GPU toolkit for deep learning, offering optimized versions of common cuDNN operations such as:  

  • Scaled dot Production attention  
  • Convolution including cross-correlation  
  • Matrix Multiplication  
  • Normalizations, Softmax, and Pooling  
  • Arithmetic, Mathematical, Relational, and Logical Point-wise Operations  

In addition to delivering fast operations, cuDNN lets you use flexible multi-operational fusion patterns for improved performance. This approach enables you to maximize the capabilities of NVIDIA GPUs for key deep-running tasks.  

cuDNN enables you to express both single-operation and multi-operation computations as operation graphs. You can build these graphs using the following API layers.  

  • Python front-end API  
  • C++ front-end API  
  • C backend API  

NVIDIA cuDNN Python and C++ frontend APIs present a user-friendly, high-level programming model suitable for most scenarios, abstracting much of the complexity of lower-level GPU operations.  

Select the NVIDIA cuDNN backend API if you need access to routines not supported by the front-end APIs or if your application requires a pure C interface.  

Key Features 

Deep Neural Networks 

Deep learning neural networks are used in computer vision, conversational AI, and recommendation platforms. They have enabled advances such as self-driving cars, smart voice assistants, and NVIDIA GPU-accelerated frameworks that help train these models much faster, cutting training time from days to just hours.  

cuDNN provides more libraries for fast, low-latency inference with deep neural networks. It works in the cloud, on embedded devices, and in self-driving cars.  

  • CuDNN accelerates key compute-bound operations, such as attention training, convolution, and matrix multiplication, while optimizing memory-bound operations, such as attention, decode, pooling, and normalization, through advanced fusions and heuristics.  
  • Optimized Memory Bound Operations like:  
  • Attention  
  • Decode  
  • Pulling  
  • Softmax  
  • Normalization  
  • Activation  
  • Pointwise  
  • Tensor Transformation  
  • Visions of Compute-Bound and Memory-Bound Operations  
  • Runtime Fusion Engine to generate kernels at runtime for common fusion patterns  
  • Optimizations for important specialized patterns like fused attention  
  • Heuristics to choose the right implementation for a given problem size  

CuDNN Graph API and Fusion 

The cuDNN Graph API lets you describe and optimize common deep learning computation patterns by organizing operations as nodes and tensors. By treating edges in a data flow graph as first-class citizens, the API enables better performance, easier optimization, and greater flexibility for model developers than handling operations individually.  

You can use the cuDNN Graph API through the recommended Python or C++ frontend APIs, which provide high-level access. For legacy applications or when Python or C++ is not suitable, use the low-level C backend API for more direct control.  

  • Flexible fusions of memory-limited operations into the input and output of matmul and convolution  
  • Specialized fusions for patterns like attention and convolution with normalization  
  • Support for both forward and backward propagation  
  • Heuristics for predicting the best implementation for a given problem size  
  • Open source Python/C++ frontend API  
  • Serialization and deserialization support  

Ethically, the media sees trustworthy AI as a shared responsibility. We have established policies and practices to support a wide range of AI applications when using a model. Under our terms of service, developers should work with their teams to ensure the model meets industry needs and prevents misuse. misuse.  

Please report security vulnerabilities of NVIDIA AI concerns here.

Source: NVIDIA cuDNN