It’s Not Magic, It’s Memory: How to Architect Short-Term Memory for Agentic AI

By Ariel Beck

Edited by Kfir Cohen

Updated May 29, 2025

It’s Not Magic, It’s Memory: How to Architect Short-Term Memory for Agentic AI

Generative AI and agentic systems need memory to sustain coherent conversations. Short-term memory maintains context within a session, while long-term memory recalls details from previous sessions or accesses background information.

This post focuses on the first part: how Jit engineered short-term memory into a real-world, production-grade agentic AppSec platform built on LangGraph. It explores the architectural decisions, infrastructure tradeoffs, and practical patterns used to ensure conversational continuity at scale—without blowing up memory or storage costs.

Why Short-Term Memory Matters in Agentic AI

Short-term memory allows AI agents to hold conversations without resetting after every message, which is crucial for clarifying questions, building reports, and navigating workflows.

This continuity is achieved through threads that capture the evolving state of a dialogue, storing conversation history, metadata, and agent responses. While not a new concept, the challenge lies in engineering this capability for production environments where memory must be persistent, scalable, and optimized for cloud infrastructure.

The Architecture: Threads, State, and the Graph

a black background with a diagram of a computer system

Recognizing that memory continuity would underpin everything from user queries to multi-step workflows, Jit made memory architecture a central design priority. It was clear early on that memory would need to serve as the connective tissue of the platform—linking actions, responses, and logic across sessions. As such, the architecture was developed with memory at its core, ensuring the user experience could scale alongside the complexity of the system.

How it Works

Jit's Agentic AppSec platform uses Langgraph’s thread IDs to manage scoped context - Which is unique per tenant and thread. Agent workflows are directed graphs in LangGraph, enabling complex interactions. LangGraph's memory management is enhanced by Jit to orchestrate agent interactions with persistent, efficient memory handling.

The checkpointer in LangGraph saves every step in the graph—messages, transitions, and agent state—enabling replayability, mid-graph recovery, and consistent context propagation.
The state is a shared object that travels with the conversation thread. It’s not just for messages; it can hold structured metadata, user inputs, and any contextual data an agent might need. Crucially, state is scoped to a thread, ensuring memory isolation across sessions.

Jit implemented a supervisor pattern where a top-level agent receives the initial request, then delegates execution to downstream agents. This structure created clean delegation logic, where each agent can perform its own tasks.

Managing State with Pre-Processing and Tooling-Aware Summarization

As interactions grow, so does the state payload. To keep things efficient, two mechanisms were introduced: pre-processing and summarization.

Dual Histories for Clean UX and Accurate Context

One challenge Jit encountered was reconciling the verbosity of agent-tool interactions with the need for a clean, user-facing experience. To address this, two separate message histories were maintained: a complete context—including tools, intermediate steps, and metadata—used internally by the LLM; and a streamlined thread consisting only of human and AI messages, shown in the UI. This separation preserves multi-turn reasoning accuracy while improving clarity and usability on the frontend.

Pre-Processing for State Hygiene

To prevent state bloat and stale values from affecting downstream execution, a pre-processing step was introduced before each new graph run. This step clears transient or no-longer-relevant fields—ensuring a clean, predictable slate for each interaction and reducing the risk of unintended carryover.

Summarization with Tool-Aware Context

At the conclusion of each session, messages are summarized when nearing the context window limit. LangGraph’s built-in langmem module generates a system message encapsulating key context and trims older messages. During this summarization step, Jit uncovered a key insight: orchestrator agents must have access to the tools necessary for task execution—such as Jira or MCP integrations. To avoid runtime failures, the summarization LLM is now explicitly bound to the orchestrator’s toolset. This binding ensures that tool context is preserved in the summarized state, allowing agents to maintain continuity and execute orchestrated tasks reliably across multi-turn conversations.

Persisting the altered messages

To keep the design modular, Jit implemented post-processing as a subgraph appended to the main graph. However, LangGraph’s default behavior for message reducers is to append state changes—leading to the original messages being preserved alongside the altered ones. This was resolved by explicitly invoking the subgraph from the main graph and replacing the message history in the final result. Before applying the subgraph output, a REMOVE_ALL_MESSAGES command is prepended to clear existing messages, ensuring only the updated content is persisted.

    async def post_process_graph_invoke(state: StateSchemaType):
       """
       The post process graph is running as a subgraph with a separate state.
       When the subgraph returns, we need to remove all the messages from the
       main state and apply the messages from the subgraph.
       """
       result = await graph.ainvoke(state)


       if "messages" in result and isinstance(result["messages"], list):
           # Extract new message IDs from result messages
           new_message_ids = {
               msg.id for msg in result["messages"] if hasattr(msg, "id")
           }


           # Combine with existing seen_message_ids
           existing_ids = state["seen_message_ids"]
           result["seen_message_ids"] = existing_ids.union(new_message_ids)


           # Replace messages as before
           result["messages"] = [RemoveMessage(id=REMOVE_ALL_MESSAGES)] + result[
               "messages"
           ]


       return result


   return post_process_graph_invoke

Storage Decisions: Betting on MongoDB

DynamoDB was initially used to store persistent data, but it had limitations: high cost due to high write volume and item size limits. Complex graph states often exceeded the 400KB limit, causing write failures. MongoDB was chosen as a replacement due to its 16MB per-document limit, which provided more space and reduced the risk of failures.

Conversation Management and TTL

With MongoDB addressing storage at the checkpoint level, the next challenge was managing entire conversation lifecycles—from creation to expiration. Jit implemented a conversation management API using DynamoDB to handle session metadata, indexing, and TTL-based cleanup logic.

To support conversation history, Jit added a conversation management API backed by DynamoDB. Each session receives a ULID (a time-aware unique identifier) for easy chronological sorting.

TTL (Time To Live) was applied—not on the checkpoint itself, but on the conversation metadata. This avoids breaking graph execution if a long-running process is still interacting with an "expired" session. Instead, the UI hides the conversation once an item has TTL, while the actual checkpoint data is deleted later (e.g., after 4 hours), ensuring both consistency and cleanup.

Generating Dynamic Titles (Not “Hi, I’m Ariel”)

To enhance usability and recall, Jit also implemented dynamic title generation—ensuring each session has a meaningful, context-aware label rather than a generic placeholder.

Rather than triggering after every session (to not create unnecessary garbage), titles are generated only when two conditions are met:

The title is still set to the default
At least one meaningful agent (e.g., Risk Assessment or Dashboard Generation) participated

When these are satisfied, a lightweight LLM generates a concise five-word title, which is sent back to the frontend via WebSocket + Persisted to the DB. Sessions that do not meet these criteria—e.g., empty or trivial exchanges—are not shown in the UI. This balances UI clarity with cost control—avoiding unnecessary inference calls. This provides long-term conversational management and ongoing ability to transparently manage the Agentic AppSec experience in the Jit platform.

Under the Hood of Production-Grade AI

There’s no such thing as “magic memory” in agentic AI. Real-world short-term memory requires intentional engineering. That means understanding how orchestration frameworks handle state, carefully filtering and summarizing content, managing storage tradeoffs, and designing conversations that won’t break mid-graph.

Jit’s newly launched Agentic AppSec AI platform is the product of carefully engineered decisions—each made to support a genuine agentic experience where context, continuity, and operational efficiency are essential. The architecture behind Jit’s Agentic AppSec AI platform reflects deliberate design choices aimed at supporting real-world agentic workflows—where memory isn’t a nice-to-have, but foundational and the backbone of the entire platform’s experience.

Check out our AI Agents here if you’re looking to automate the process of vulnerability triage, custom report creation, and developer feedback for security.

In the next post, the team will cover long-term memory, knowledge retrieval, and how to structure user-visible conversation histories.

Until then—keep your messages summarized, your reducers sharp, and your junk filtered.