Hi My Name Is...the Not So Shady Side of Long-Term Memory in AI


Published May 25, 2025.

In our last post, we explored how short-term memory enables agentic AI to hold a conversation that doesn’t reset after every message. That form of memory is all about flow—preserving context, user intent, and logic within a single session, even as interactions stretch across multiple turns. The longer the session, the more memory is required to maintain continuity.
But not all memory needs to be verbose. Long-term memory serves a different purpose: persistence across sessions. It’s less about real-time responsiveness and more about storing compact, critical facts that shape future conversations—preferences, past actions, or contextual details like “Name is Chris” or “Looking for a restaurant in San Francisco.”
This post picks up where the last one left off: transitioning from conversational continuity to cross-session context. We’ll walk through how Jit implemented long-term memory using mem0 and a vector DB—designing a scalable memory layer for agents that remember what matters, even when the session ends.
Long-Term Memory as the Next Frontier for Agentic AI
An agent without long-term memory is like living in a time loop––where all memory erases between sessions––like constantly being in a video game as an NPC. What will essentially happen is an agent will answer your questions, perform actions, and hold a conversation—but ultimately will reset and forget everything once the thread ends. There’s no continuity, no learning, and no sense that the agent “knows” you. In short, the experience stays shallow.
Jit’s AppSec AI platform was designed to avoid that trap. While short-term memory allows for coherent, multi-turn sessions, it is long-term memory that gives agents contextual power—enabling them to remember user preferences, recall past conversations, and evolve over time. The result isn’t just more helpful responses, but a more personal, consistent experience across sessions.
This post walks through how Jit approached long-term memory for its agentic system using open-source tools like mem0 and Qdrant, a vector database designed for fast, scalable retrieval.
What Long-Term Memory Actually Means
Long-term memory in AI agents isn’t about storing everything. It’s about remembering what matters and should be stored for a better experience in the long-term.
A good memory system should:
Track facts and preferences across sessions.
Remember assistant actions (e.g., “sent an email with subject X”).
Surface relevant context during future interactions.
Avoid redundant repetition or unnecessary context bloating.
To accomplish this, Jit leverages a two-step memory pipeline: fact extraction and semantic retrieval.
The Stack: mem0 + Qdrant
Jit uses mem0 as a memory management layer, plugged into Qdrant for vector storage and retrieval. Every user interaction is evaluated post-session to determine if it contains extractable facts.
Here’s the high-level flow:
A user interacts with the agent.
The entire user-assistant exchange is passed through a fact extraction prompt, tuned to Jit’s needs.
If any meaningful facts are found (e.g., “Name is Chris”, “Sent an email with subject ‘Tree’”), they are embedded and stored in the vector DB, tagged with the user_id, run_id, and optionally agent_id.
Future sessions from that user or agent can retrieve relevant facts using semantic search—fueling more context-aware conversations.
This diagram captures the architecture:
How Facts Get Saved (or Discarded)
The fact extraction logic is driven by a custom system prompt—an adapted version of the FACT_RETRIEVAL_PROMPT from mem0. This prompt gives the agent specific rules regarding what to store (e.g., preferences, names, actions) and what to ignore (e.g., “hi” or generic greetings).
Facts are only saved if the LLM determines they’re relevant. This reduces token bloat and storage cost, while ensuring agents retain what truly matters.
Here's a sample interaction and what gets saved:
User: Hi, my name is Chris.
Assistant: Nice to meet you, Chris.
→ {"facts": ["Name is Chris"]}
User: I sent an email with "hello" as the topic.
→ {"facts": ["Sent an email with 'hello' as the topic"]}
These facts are now embedded and saved in Qdrant, accessible for semantic retrieval on any future thread tied to the same user.
ID Design: Session vs. Identity
Long-term memory’s usefulness hinges on context. Jit structures its memory keys around three IDs:
user_id: The user’s persistent identity (e.g., Slack username).
run_id: The current session or thread (e.g., Slack thread ID).
agent_id: The agent or skill invoked (e.g., “risk-assessment”).
For short-term memory, user_id + run_id is sufficient. For long-term memory, user_id + agent_id allows for broader, cross-session recall (e.g., remembering that the risk-assessment agent has previously worked on a given repo).
No Summarization Needed
One elegant side effect of this architecture? There’s no need to summarize sessions for memory purposes. Since facts are extracted and stored individually, memory stays compact and semantically searchable. This simplifies token management and lets Jit maintain leaner context windows during runtime.
Kubernetes-Friendly Deployment
Qdrant is deployed in Jit’s Kubernetes environment using Qdrant’s Helm charts. Each deployment is given a consistent DNS name (e.g., qdrant.<namespace>.svc.cluster.local) and persistent storage is enabled.
TTL policies (time-to-live) for stored vectors are still being evaluated—especially to handle session cleanup or regulatory compliance—but early tests show that memory pruning can be implemented either in the application layer or with Qdrant’s native API.
What's Next for Agentic AI?
Jit’s long-term memory architecture is already enabling smarter, more contextual AppSec workflows. But this is only the beginning. Next steps include:
Enriching the memory prompt to track more nuanced assistant behaviors.
Using memory to influence agent planning and tool selection.
Automatically aging or pruning outdated memory to keep interactions fresh.
For now, it’s clear: without memory, agents don’t deepen their context—they only reset, and in production-grade systems, that’s not just a UX you can’t afford––it’s an operational blind spot that sets you up for failure. Stay tuned and follow for more updates on our AI journey, and how to architect systems built to channel the infinite capabilities AI unlocks.