Google Gemini 3 Flash Gets Agentic Vision

Google has added Agentic Vision 3 to its Gemini 3 Flash model to help it better understand images and make fewer mistakes in visual tasks.

Agentic Vision lets the model work like an active investigator. It follows a think-act-observe process to examine and modify images by running code.

This update helps prevent the AI from making guesses when image details are small or hard to see.

Main features of Agentic Vision

Think, Act, Observe loop: The model first looks at the question and image (think). Next, it writes Python code to change or study the image (like cropping or adding nodes) (act). Finally, it checks the updated image for greater context before answering (observe)

5-10% quality boost: running code with Agentic Vision in Gemini 3 Flash makes the model 5-10% more accurate on most visual tests.

Visual scratch pad: the model can add notes or marks directly on images, so its analysis uses actual image pixels.

Reduced hallucination: Agentic Vision uses Python code to perform tasks such as counting small objects, reading distant text, or studying tables. This stops the model from making random guesses that lead to mistakes.

Main Uses

Zooming and inspection: The model can zoom in on small or blurry details on its own.

Visual Math and Plotting: Agentic Vision pulls data from tables in images, does the math, and makes charts rather than guessing the numbers.

Interactive annotations can draw boxes and labels to count items in busy images accurately.

Where to Find it

You can find Agentic Vision in:

Google AI Studio: Developers can turn on code execution under “Tools” in the playground.

Vertex AI: Available through the Gemini API.

Gemini app: added under the Thinking Model option.

Future updates will make these features automatic and bring them to other Gemini models.

Frontier AI models like Gemini usually process the world in a single static glance. If they miss a small detail, such as a microchip’s serial number or a distant street sign, they have to guess.

Agentic Vision in Gemini 3 Flash changes image understanding from a static process to an active one. It treats vision as an investigation by combining visual reasoning with code execution. The model can plan to zoom in, inspect, and manipulate images step by step, grounding its answers in visual evidence.

Allowing code execution with Gemini 3 flash gives a steady 5-10% quality boost on most vision benchmarks.

Agentic Vision: A New Frontier in AI Capability

Agentic Vision brings a think-act-observe loop to image understanding tasks.

Think: the model examines the user’s question and the initial image, then generates a step-by-step plan.

Act: The model writes and runs Python code to work with images, such as cropping, rotating, and adding nodes. It also analyzes images by running calculations or counting objects.

Observe: The changed image is added to the model’s context. This helps the model review the new data with more context before giving a final answer.

Agentic Vision in Action

When you enable code execution in the API, you open up a range of new possibilities. Our demo app in Google AI Studios shows many of these in action. Developers from large companies using the Gemini app to small startups are already using this feature for a variety of use cases, such as:

Zooming and Inspecting

Gemini 3 Flash automatically zooms in on small, detailed features. Planchecksolver.com, an AI tool for checking building plans, increased its accuracy by 5% after enabling code execution with Gemini 3 Flash. This allowed the platform to inspect high-definition images in a step-by-step fashion. In a video of the backend logs, you can see Gemini 3 Flash generate Python code to crop and analyze specific areas, such as roof edges or building sections, into new images. By adding these cropped images back into its context, the model can visually check its reasoning and confirm that plans meet complex building codes.

Image Annotation

In the Agentic Vision, the model can interact with its environment by adding notes or drawings to images rather than only describing what it sees. Gemini 3 Flash can run code or draw directly on the image, helping to show its reasoning.

In the example below, the model is asked to count the fingers on a hand. In the Gemini app, to avoid mistakes, it uses Python to draw boxes and numbers over each finger it finds. This visual scratch pad helps ensure the answer is accurate down to the pixel.

Visual Math and Plotting

Agentic Vision can read complex tables and use Python code to create visualizations of the results.

Standard language models can make mistakes when doing multi-step visual math. Gemini 3 Flash avoids this by using a reliable Python environment for calculations. In the example below, from our demo app in Google AI Studio, the model finds the raw data, writes code, sets the previous SOTA to 1.0, and creates a matplotlib bar chart. This way, the results are based on real execution, not guesses.

What’s Next?

We are only at the beginning with Agentic Vision.

More implicit code-driven behaviors: Right now, Gemini 3 flash is great, automatically zooming in on small details. Other features, like rotating images or doing visual math, still need a clear prompt to work. We are working to make these actions automatic in future updates.

More Tools: We are also exploring ways to give Gemini models more tools, such as web search and reverse image search, to help them better understand the world.

More model sizes: We also plan to bring this feature to more of our models, not just Flash.

Source: Introducing Agentic Vision in Gemini 3 Flash

Visual Studio February Update: Microsoft Unleashes One-Click Agentic Testing via GitHub Co-Pilot

Live from San Francisco: Samsung Unveils Galaxy S26 Ultra with “Privacy Shield” AI Display

Latest post

Visual Studio February Update: Microsoft Unleashes One-Click Agentic Testing via GitHub Co-Pilot

Live from San Francisco: Samsung Unveils Galaxy S26 Ultra with “Privacy Shield” AI Display

Samsung Confirms 7-Year Android 16 Support For Galaxy S26, But There Is A Price Hike Catch

Popular Posts

Best Business Laptops 2025 (1425)

The Future Is Calling: Top Upcoming Smartphones of 2026 You’ll Want to Wait For (842)

Apple Expected to Launch New MacBooks with Next-Gen Apple Silicon (513)

DSLR vs Mirrorless: Which Is Better for Photography Beginners? (407)

Best Smartphones 2025: Complete Buyer’s Guide with Android (405)

Stay Connected

Google Updates Gemini 3 Flash With Agentic Vision To Reduce AI Hallucinations In Visual Tasks

Harish Shenoy

Leave a Reply Cancel reply

Latest Posts

Visual Studio February Update: Microsoft Unleashes One-Click Agentic Testing via GitHub Co-Pilot

Live from San Francisco: Samsung Unveils Galaxy S26 Ultra with “Privacy Shield” AI Display

Samsung Confirms 7-Year Android 16 Support For Galaxy S26, But There Is A Price Hike Catch

Find us on Facebook

Quick Links

Latest post

Popular Posts

Best Business Laptops 2025 (1425)

The Future Is Calling: Top Upcoming Smartphones of 2026 You’ll Want to Wait For (842)

Apple Expected to Launch New MacBooks with Next-Gen Apple Silicon (513)

DSLR vs Mirrorless: Which Is Better for Photography Beginners? (407)

Best Smartphones 2025: Complete Buyer’s Guide with Android (405)

Stay Connected

Related Article

Leave a Reply Cancel reply

Latest Posts

Find us on Facebook