Technical Notes
Technical Notes
What we still don't know about agent memory
What we still don't know about agent memory

Thrindex
Most writing about agent memory is written as if the problem is solved and the only question left is whose implementation is fastest. It isn't solved. It is a young problem, and a serious team should be able to say plainly where the hard edges still are. Here are four of them.
When two memories conflict and both still look valid
The easy version of conflict is when one memory clearly replaces another: a preference stated, then reversed. Recency settles it. The hard version is when two memories disagree and neither is obviously wrong.
A user tells the agent in one conversation that they want concise replies. In another, with a different task in front of them, they ask for thorough, detailed ones. Neither statement is stale. Neither is a mistake. They are both true — under different conditions. A memory system that resolves this by recency will simply oscillate, obeying whichever instruction was given last. The real answer requires understanding context: that the preference was conditional, not absolute. Representing conditional, context-dependent truth — rather than flat facts — is an open problem, and most systems today do not even attempt it.
Knowing when something should be forgotten
A memory system can decide what to keep. Deciding what to actively forget is much harder, because forgetting the wrong thing is invisible until it matters.
If a system drops a memory that was genuinely important, nothing fails immediately. The gap only shows up later, when the agent needed that fact and no longer had it — and by then there is no error to trace, just an agent that seems slightly less aware than it should be. This makes forgetting almost impossible to tune by feedback, because the cost of a bad forget is delayed and silent. How a system should decide what to let go, and how anyone would even measure whether it decided well, is unresolved.
Measuring whether memory is any good
This one is foundational, and it is genuinely unsettled. Retrieval quality is usually scored by semantic relevance — did the system return memories related to the query. But relevance is not correctness, and it is not usefulness. A memory can be perfectly relevant and completely out of date. It can be accurate and useless for the task at hand.
The field does not yet have a widely agreed way to measure the things that actually matter: is the retrieved memory current, is it important, is it relevant to what the agent is trying to do. Without shared measurement, every claim of accuracy is hard to compare and easy to inflate. Better evaluation is not a side quest here. It is a precondition for the field making honest progress at all.
Memory that spans many agents
Almost everything written about agent memory assumes one agent and one user. Increasingly that is not the shape of real deployments. A user interacts with a fleet of agents — one for scheduling, one for research, one for support — and a fact told to one is often relevant to the others.
That raises questions with no settled answers. Should memory be shared across agents by default, or isolated? If a memory is shared, who is accountable for it being wrong? When one agent learns something that contradicts what another agent recorded, which wins? These are partly engineering questions and partly questions of trust and permission, and the honest status is that the patterns are still being worked out.
None of this means agent memory does not work. It works, and it is already useful. It means the problem is live — that there is real research left, not just optimization. We think that is the more interesting situation to be in, and the more honest thing to say. A team that can name what it has not solved is usually further along than one that claims to have solved everything.
Most writing about agent memory is written as if the problem is solved and the only question left is whose implementation is fastest. It isn't solved. It is a young problem, and a serious team should be able to say plainly where the hard edges still are. Here are four of them.
When two memories conflict and both still look valid
The easy version of conflict is when one memory clearly replaces another: a preference stated, then reversed. Recency settles it. The hard version is when two memories disagree and neither is obviously wrong.
A user tells the agent in one conversation that they want concise replies. In another, with a different task in front of them, they ask for thorough, detailed ones. Neither statement is stale. Neither is a mistake. They are both true — under different conditions. A memory system that resolves this by recency will simply oscillate, obeying whichever instruction was given last. The real answer requires understanding context: that the preference was conditional, not absolute. Representing conditional, context-dependent truth — rather than flat facts — is an open problem, and most systems today do not even attempt it.
Knowing when something should be forgotten
A memory system can decide what to keep. Deciding what to actively forget is much harder, because forgetting the wrong thing is invisible until it matters.
If a system drops a memory that was genuinely important, nothing fails immediately. The gap only shows up later, when the agent needed that fact and no longer had it — and by then there is no error to trace, just an agent that seems slightly less aware than it should be. This makes forgetting almost impossible to tune by feedback, because the cost of a bad forget is delayed and silent. How a system should decide what to let go, and how anyone would even measure whether it decided well, is unresolved.
Measuring whether memory is any good
This one is foundational, and it is genuinely unsettled. Retrieval quality is usually scored by semantic relevance — did the system return memories related to the query. But relevance is not correctness, and it is not usefulness. A memory can be perfectly relevant and completely out of date. It can be accurate and useless for the task at hand.
The field does not yet have a widely agreed way to measure the things that actually matter: is the retrieved memory current, is it important, is it relevant to what the agent is trying to do. Without shared measurement, every claim of accuracy is hard to compare and easy to inflate. Better evaluation is not a side quest here. It is a precondition for the field making honest progress at all.
Memory that spans many agents
Almost everything written about agent memory assumes one agent and one user. Increasingly that is not the shape of real deployments. A user interacts with a fleet of agents — one for scheduling, one for research, one for support — and a fact told to one is often relevant to the others.
That raises questions with no settled answers. Should memory be shared across agents by default, or isolated? If a memory is shared, who is accountable for it being wrong? When one agent learns something that contradicts what another agent recorded, which wins? These are partly engineering questions and partly questions of trust and permission, and the honest status is that the patterns are still being worked out.
None of this means agent memory does not work. It works, and it is already useful. It means the problem is live — that there is real research left, not just optimization. We think that is the more interesting situation to be in, and the more honest thing to say. A team that can name what it has not solved is usually further along than one that claims to have solved everything.


Read more articles
Read more articles

Business insurance myths that could put your company at risk
Don't let misconceptions leave your business vulnerable. We debunk the most dangerous about commercial property insurance.
Business insurance

How weather patterns are changing property insurance (what you need to know)
Climate change is affecting coverage and costs. Discover how changing weather impacts your property insurance and stay protected.
Industry insights

What to do in the first 24 hours after property damage
Quick action after damage can save you money and stress. Follow this step-by-step checklist to protect your property.
Claims advice

Business insurance myths that could put your company at risk
Don't let misconceptions leave your business vulnerable. We debunk the most dangerous about commercial property insurance.
Business insurance

How weather patterns are changing property insurance (what you need to know)
Climate change is affecting coverage and costs. Discover how changing weather impacts your property insurance and stay protected.
Industry insights
GET STARTED
Let's find your perfect coverage
Tell us about your property and we'll create a custom insurance plan just for you in less than 5 minutes.
GET STARTED
Let's find your perfect coverage
Tell us about your property and we'll create a custom insurance plan just for you in less than 5 minutes.
GET STARTED
Let's find your perfect coverage
Tell us about your property and we'll create a custom insurance plan just for you in less than 5 minutes.