Google DeepMind has launched new safety benchmarks and methods to help assess and improve the security of AI agents in business settings. These efforts target emerging risks such as unauthorized access, data breaches, and agents failing to follow safety rules as they become more advanced.  

Key Developments in Agent Safety 

  • ClawsBench (April 2026): Researchers created ClawsBench to test LLM productivity agents in realistic mock environments like Gmail, Slack, and Drive. The benchmark uses structured tasks to separate score safety and performance and penalizes harmful actions.  
  • Frontier Safety Framework (February 2025): DeepMind updated its Frontier Safety framework to help spot, assess, and reduce serious risks from advanced AI agents, such as cyber threats and malicious use.  
  • Intelligent delegation research (February 2026): DeepMind researchers argue that agent delegation (assigning tasks to AI agents) is a governance challenge. Instead of just splitting tasks, their framework entails giving agents limited authority and adding checks and monitoring to handle failures among multiple agents.  
  • Similarly, the CodeMender AI agent (October 2025) is a security-focused AI agent that automatically fixes software vulnerabilities. It runs continuously in business environments to help reduce security risks.  

Enterprise Focus 

Collectively, these new safety measures support the move toward agent-based workflows in which AI agents interact with company data tools and third-party APIs. The aim is to ensure their actions are reliable and auditable rather than unpredictable.  

  • Key security areas: the benchmarks assess how well agents handle adversarial prompts (malicious or misleading inputs intended to trick AI), workflow interruptions (unexpected stops or changes in a process), and containment or sandboxing rules (keeping AI within controlled computing environments).  
  • System-level security: Researchers highlight a shift-left approach that involves identifying and addressing security issues earlier in the development process. They use dedicated interpreters, such as the Camel system (a specialized program for controlling how data moves between different parts of a system), to enforce data flow policies rather than relying solely on language models (LLMs) ‘ native safety features.  

This change comes as the 2026 AI market is under more scrutiny, with reports of rogue agents trying to bypass safety measures. As a result, uniform safety testing for businesses is now essential.  

Google DeepMind published an updated version of its Frontier Safety Framework on Tuesday, outlining ways it intended to address potential dangers caused by future artificial intelligence models.  

The new framework, announced before an international AI summit in Paris next week, introduces techniques to address theoretical issues, such as models that could deceive people into giving up control over technology.  

We sit at the forefront of capabilities development, so we have to be at the forefront of safety responsibility as well. Tom Lue, Google DeepMind’s general counsel and head of governance, said in an interview with Semafor.  

The framework also adds new guidelines for handling AI security risks and updates procedures for addressing misuse of these models.  

Google DeepMind released the first version of its framework in May last year. Since then, the AI landscape has changed.  

For example, most safety research a year ago focused on AI models during their initial creation, the pre-training phase. Regulations like California’s SB 1047 tried to limit models based on their pre-training size.  

However, in the past six months, researchers have found ways to boost AI model capacity using the inference phase (when the model is actually used to make predictions or generate text). Running models multiple times to improve answers makes them much more effective.  

For example, the DeepSeek R1 model would not have been covered by safety bills like SB 1047, which California Governor Gavin Newsom vetoed despite its very powerful nature. This is because most of its abilities come from inference rather than its initial training size.  

What you’re seeing with these new test time and inference models is a different type of capability that’s emerging, Liu said. That’s that, plus the fact that we are now going to see the emergence of giants, increased tool use, and the ability to delegate more activities, means the suite of responsibility, risk evaluations, and mitigations, of course, has to evolve.  

Helen King, DeepMind’s senior director of responsibility, said, “The changing AI landscape brings some positive news for safety.  

New “Reasoning models such as OpenAI’s o1 and o3 and DeepSeek’s R1 could help us better understand how these models work. “It’s sort of like in a school exam when you have to explain your thinking,” King said.  

The past year of AI development has shown that AI safety is still in its early stages. Any law passed now will likely become outdated soon.  

Google DeepMind’s approach, like that of other top AI companies, is to continually update its framework to keep pace with the industry’s rapid changes.   

Many “experts” predicted an AI disaster by now, but it hasn’t happened yet. This doesn’t mean it won’t, but it suggests AI is advancing slowly enough for the industry to address safety concerns.  

Deceptive AI models may sound alarming, but they aren’t something to worry about too much. The good news is that many people, including the companies building AI, are taking safety seriously.

SourceGoogle releases new AI safety framework 

Amazon

Leave a Reply

Your email address will not be published. Required fields are marked *