Google has launched a new always-on memory agent. This system continually rereads, organizes, and handles memory tasks. It enables models like the flashlite version of Gemini to stay active at a lower cost. The agent also delivers faster response times and outperforms earlier versions.  

Some key features of the system are:  

  • The memory agent operates continuously in the background, keeping the AI’s memory updated without demanding ongoing costly processing.  
  • It targets common tasks such as UI generation, moderation, and simulation with high efficiency.  
  • The system integrates into runtime strategies and supports workflow agents and multi-agent systems deployed on Google Cloud Run and Vertex AI.  
  • This technology actively manages memory and could replace traditional vector databases by delivering a more efficient, always-on solution.  

Overall, this development addresses the amnesia problem in large language models by leveraging long-term memory.  

Tech companies are adding long-term memory to large language models to fix the amnesia problem.  

The project was built using Google’s agent development kit (ADK), which launched in spring 2025, and Google Gemini 3.1 Flash Lite, a low-cost model released on March 3, 2026. Flash Lite is the fastest and most cost-efficient model in the Gemini 3 series.  

This project serves as a practical example of something many AI teams want. Few have built an agent system that continuously takes in information, organizes it in the background, and retrieves it later without a traditional vector database.  

For enterprise developers, this release is more important as a sign of where agent infrastructure is going than as a product launch.  

The repository offers a look at long-running autonomy, which is becoming more appealing for support systems, research assistance, internal copilots, and workflow automation. It also raises governance questions when memory is not limited to a single session.  

What the Repository Seems to Do and What It Does Not Clearly Claim 

The repository also appears to use a multi-agent internal architecture with specialized components for ingestion, consolidation, and querying.  

The materials do not present this as a shared memory framework for multiple independent agents.  

The difference matters. ADK supports multi-agent systems, but this repository is best described as an always-on memory agent or memory layer built with specialized sub-agents and persistent storage.  

Even at this more limited level, it tackles a key infrastructure problem that many teams are trying to solve.  

The Architecture Is Simple and Avoids a Traditional Retrieval Stack 

The repository says the agent runs continuously, accepts files for our API input, stores structured data in SQLite, and consolidates memory by default every 30 minutes.  

A local HTTP API and a Streamlit dashboard are in place. The system can handle text, image, audio, video, and PDF files. The repository describes the design boldly. No vector database, no embeddings, just an LLM diagram that reads things and writes structured memory.  

The design will likely catch the eye of developers focused on cost and complexity. Traditional retrieval stacks often require separate embeddings, pipelines, vector storage, indexing logic, and synchronization.  

Saboo’s example relies on the model to organize and update memory. These can make prototypes simpler and reduce input. Infrastructure, scroll: the performance focus shifts from vector search overhead to model latency, memory compaction, and stability.  

Flash Lite Makes the Always-On Model More Affordable 

Gemini 3.1 Flash Lite enables this always-on model.  

Google says the model is designed for high-volume developer workloads and is priced at $0.25 for 1,000,000 input tokens and $1.50 for 1,000,000 output tokens.  

The company also says that Flash Lite is 2.5 times faster than Gemini 2.5 in time-to-first-token and offers a 45% boost in output speed while maintaining or improving quality.  

According to Google’s benchmarks, the model scores 1432 on arena.ai, 86.9% on GPQA Diamond, and 76.8% on MMMU Pro. Google says these features make it well-suited for high-frequency tasks such as translation, moderation, UI generation, and simulation.  

These numbers show why Flash Lite is used with a background memory agent column. It enables a 24/7 service to re-read, consolidate, and serve memory with predictable latency and low inference costs, ensuring affordable, reliable, always-on performance.  

Google’s ADK documentation endorses this bigger picture. The framework is model-agnostic and deployment-agnostic. It supports workflow agents, multi-agent systems, tools, and evaluation and deployment options such as Cloud Run and Vertex AI Agent Engine. This makes the memory agent seem less like a one-off demo and more like a reference for a wider set of agents. For an enterprise, the main debate is about governance, not just capability. Public reaction shows that enterprise adoption of persistent memory depends on more than just speed or token pricing.  

On X, several responses highlighted enterprise concerns. Franck Abe called Google ADK and 24-7 agent autonomy, but warned that an agent dreaming and mixing memories in the background without clear boundaries creates a compliance nightmare.  

The LED agreed, saying the main cause of always-on agents is not tokens but drift and loops.  

These critiques focus on the functional challenges of persistent systems. Who can write memory? What gets merged? How does retention work? If the agent fails to learn correctly, then our memory is deleted. How do teams audit what the agent has learned over time?  

Another response: Iffy questioned the repos’ claim of no embeddings. Iffy argued the system still needs to chunk, index, and retrieve structured memory. I also said it may work well for small context agents but could struggle as memory stores grow.  

This criticism matters. Removing a vector database does not eliminate the need for retrieval design; it just shifts the complexity elsewhere.  

For developers, the trade-off is about fit, not ideology. A lighter stack suits those building low-stack, bounded memory agents. Larger deployments may need stricter retrieval controls, clearer industry strategies, and stronger life-cycle tools. ADK expands the story beyond just one demo.  

Other commenters focused on the developer’s workflow. One person asked for the ADK repository and documentation and wanted to know if the runtime is server- or long-running, and if tool calling and evaluation hooks are available by default.  

The answer is both. The memory agent example runs as a long-running service. Eric supports multiple deployment patterns and includes tools and evaluation features. The always-one memory agent is notable, but the main point is that Saboo wants agents to function as deployable software systems, not just isolated points; in this approach, memory becomes part of the runtime layer rather than an add-on.  

What Saboo Has Shown and What He Has Not 

What Saboo has not shown yet is just as important as what he has published.  

The provided materials do not include a direct benchmark comparing Flash Lite and Anthropic, Claude Haiku for agent loops in production.  

They do not outline enterprise-grade compliance controls for this memory agent. These would include deterministic policy boundaries, retention guarantees, segregation rules, or formal audit workflows.  

While the repository appears to use several specialist agents internally, the materials do not clearly support a broader claim about persistent memory. We shared across multiple independent agents.  

For now, the repository serves as a strong engineering template, not a full enterprise memory platform.  

Why This Is Important Now 

Still, this release comes at the right time. Enterprise AI teams are moving past singleton assistance and toward systems that remember preferences, retain project information, and operate for longer periods.  

Saboo’s open-source memory agent provides teams with a solid foundation for building infrastructure that supports long-term context and persistent information. Flash Lite further benefits organizations by reducing costs and making advanced agent capabilities accessible to more teams.  

The main takeaway: continuous memory will be judged on both governance and capability.  

The real enterprise question is whether an agent can remember in ways that are limited, inspectable, and safe for production.  

Source: Google PM open-sources Always On Memory Agent, ditching vector databases for LLM-driven persistent memory