In late 2025, the Google Threat Intelligence Group (GTIG) observed more threat actors using artificial intelligence (AI) to accelerate attacks, particularly in reconnaissance, social engineering, and malware development. This report updates our November 2025 findings on how threat actors are using AI tools.  

By identifying these early indicators and offensive proofs of concept, GTIG aims to arm defenders with the intelligence needed to anticipate the next phase of AI-enabled threats, proactively thwart malicious activity, and continually strengthen our classifiers and models.  

Executive Summary 

Google DeepMind and GTIG have observed more attempts at model extraction, also known as distillation attacks, which constitute a form of intellectual property theft and violate Google’s Terms of Service. In this report, we describe the steps we have taken to stop this activity, including detection, disruption, and model extraction. We have not seen direct attacks on our most advanced models or generative AI products from Endurance-Advanced Enduring actors. We have stopped many attempts by private companies and researchers worldwide to extract our proprietary logic.  

Government-backed threat actors now use large language models as key tools for technical research, targeting, and quickly creating more convincing phishing messages. This report shows how groups from the Democratic People’s Republic of Korea, Iran, the People’s Republic of China, and Russia used AI in late 2025. It also helps us understand how misuse of generative AI appears in real-life campaigns we have stopped so far. GTIG has not seen APT or information operations (IO) actors reach new capabilities that would change the overall threat landscape.  

This report looks at the following areas:  

  • Model extraction attacks: Over the past year, distillation attacks have become a more common means of stealing intellectual property.  
  • AI-augmented operations: Real-world examples show how groups are making reconnaissance and phishing more efficient by leveraging AI to build trust.  
  • Agentic AI: Threat actors are beginning to explore building Agentic AI tools to support malware and tool development.  
  • AI-integrated malware: New malware families like Honest Queue are testing Gemini’s application programming interface (API) to create code that can download and run second-stage malware.  
  • Underground jailbreak ecosystem: Malicious services such as Xanthorox are appearing in underground markets. They claim to be independent models but actually use jailbroken commercial APIs and open-source model context protocol (MCP) servers.  

At Google, we are committed to developing AI boldly and responsibly. This means we act to stop malicious activity by turning off projects and accounts linked to bad actors, and we keep improving our models to make them harder to misuse. We also share best practices with the industry to help defenders and strengthen protections across the ecosystem.  

In this report, we describe the steps we have taken, including disabling assets and leveraging intelligence to enhance the security of our classifiers and models. You can find more details about how we protect Gemini in the white paper, Advancing Gemini’s Security Safeguards.  

Direct Model Risks: Disrupting Model Extraction Attacks 

As more organizations use LLMs in their main operations, the unique logic and training behind these models have become valuable targets. In the past, attackers would break into computer systems and steal trade secrets. Now that many LLMs are available as services, attackers can use regular API access to try to replicate specific AI model features.  

In 2025, we did not see any direct attacks on advanced models from known APT or Information Operations groups. However, we noticed model-extraction attacks, also known as distillation attacks, on our AI models. These attacks aim to learn how a model reasons and makes decisions.  

What Are Model Extraction Attacks 

Model extraction attacks occur when someone uses authorized access to carefully test a machine learning model and gather information to train a new one. Attackers use a method called knowledge distillation (KD) to transfer knowledge from one model to another. That is why the MEA is often called a distillation attack.  

Model extraction and knowledge distillation enable attackers to speed up AI model development and reduce costs. This is a type of intellectual property (IP) theft.  

Knowledge distillation (KD) is becoming a common machine learning method for training student models using existing teacher models. This usually means asking the teacher model questions in a certain area, then fine-tuning the results or using them in other training steps to create the student model. Distillation has valid uses, and Google Cloud offers tools for it. But using distillation on Google’s Gemini models without permission breaks our Terms of Service. Google is working on ways to spot and stop these attempts.  

Source:GTIG AI Threat Tracker: Distillation, Experimentation, and (Continued) Integration of AI for Adversarial Use 

Amazon

Leave a Reply

Your email address will not be published. Required fields are marked *