Agents & OrchestrationHow It Works

Building a Claude chatbot that remembers users across sessions

In brief

Persistent memory for chatbots is not a Claude feature — it is an architecture decision. Here is how to build it correctly.

8 min read·

Contents

♡Sign in to save

The context window resets every time a new conversation starts. Claude does not remember previous sessions by default — it has no persistent state. If you want a chatbot that knows a user's name, preferences, history, or past interactions, you have to build that yourself.

This is not a limitation to work around. It is an architecture decision. The right approach depends on what you actually need to remember.

What kinds of memory matter

Before building anything, identify what you need to persist:

User facts: name, job, location, timezone, preferences — things that do not change often and apply to all conversations.

Conversation summaries: what was discussed in previous sessions — useful for returning users but does not need to be verbatim.

Entities and relationships: things the user has mentioned (projects, people, goals) that might come up again.

Session history: the full turn-by-turn record — usually too large to include in every prompt; use summarization instead.

The architecture

A persistent memory chatbot has three layers:

User message
     ↓
Memory retrieval (fetch relevant context from DB)
     ↓
Prompt assembly (system + memory summary + recent history + new message)
     ↓
Claude API call
     ↓
Response + memory update (extract and store new facts)

The memory store can be as simple as a JSON file for single-user applications, or a proper database for multi-user production systems.

Implementation

Step 1: User profile store

import json
import os
from pathlib import Path
from anthropic import Anthropic

client = Anthropic()

def load_user_memory(user_id: str) -> dict:
    path = Path(f"memory/{user_id}.json")
    if path.exists():
        return json.loads(path.read_text())
    return {"user_id": user_id, "facts": {}, "session_summaries": [], "entity_notes": {}}

def save_user_memory(user_id: str, memory: dict) -> None:
    Path("memory").mkdir(exist_ok=True)
    Path(f"memory/{user_id}.json").write_text(json.dumps(memory, indent=2))

Step 2: Memory extraction after each response

After every assistant turn, ask Claude to extract any new facts worth storing:

EXTRACT_PROMPT = """Review this conversation and extract any new, persistent facts about the user.
Output a JSON object with a "facts" key containing key-value pairs.
Only include information that would be useful in future conversations.
If nothing new was learned, output {"facts": {}}.

Conversation:
{conversation}"""

def extract_new_facts(conversation: list[dict]) -> dict:
    convo_text = "
".join(
        f"{m['role'].upper()}: {m['content']}"
        for m in conversation[-4:]  # only look at recent turns
    )
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",  # cheap model for extraction
        max_tokens=256,
        messages=[{
            "role": "user",
            "content": EXTRACT_PROMPT.format(conversation=convo_text)
        }]
    )
    try:
        return json.loads(response.content[0].text)
    except (json.JSONDecodeError, IndexError):
        return {"facts": {}}

Use a cheap model (Haiku) for extraction — this runs after every turn and does not need Sonnet-level capability.

Step 3: Session summarization

At the end of a session (or when history gets long), summarize:

SUMMARIZE_PROMPT = """Summarize this conversation in 2-3 sentences.
Focus on what was discussed, any decisions made, and anything the user might want to follow up on.

Conversation:
{conversation}"""

def summarize_session(conversation: list[dict]) -> str:
    convo_text = "
".join(
        f"{m['role'].upper()}: {m['content']}"
        for m in conversation
    )
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=128,
        messages=[{"role": "user", "content": SUMMARIZE_PROMPT.format(conversation=convo_text)}]
    )
    return response.content[0].text

Step 4: Assembling the prompt with memory

def build_system_prompt(memory: dict) -> str:
    parts = [
        "You are a helpful assistant. You have memory of past conversations with this user."
    ]
    
    if memory.get("facts"):
        facts_str = "
".join(f"- {k}: {v}" for k, v in memory["facts"].items())
        parts.append(f"
What you know about this user:
{facts_str}")
    
    if memory.get("session_summaries"):
        recent = memory["session_summaries"][-3:]  # last 3 sessions
        summaries_str = "
".join(f"- {s}" for s in recent)
        parts.append(f"
Previous conversations:
{summaries_str}")
    
    return "
".join(parts)

def chat(user_id: str, user_message: str, history: list[dict]) -> tuple[str, list[dict]]:
    memory = load_user_memory(user_id)
    
    history.append({"role": "user", "content": user_message})
    
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=build_system_prompt(memory),
        messages=history
    )
    
    assistant_message = response.content[0].text
    history.append({"role": "assistant", "content": assistant_message})
    
    # Extract and store new facts
    extracted = extract_new_facts(history)
    memory["facts"].update(extracted.get("facts", {}))
    save_user_memory(user_id, memory)
    
    return assistant_message, history

def end_session(user_id: str, history: list[dict]) -> None:
    memory = load_user_memory(user_id)
    summary = summarize_session(history)
    memory["session_summaries"].append(summary)
    # Keep last 10 session summaries
    memory["session_summaries"] = memory["session_summaries"][-10:]
    save_user_memory(user_id, memory)

What to store, what to skip

Store: explicit user statements about themselves ("I work in marketing", "I have two kids", "I prefer bullet points"), decisions made, topics to follow up on.

Skip: anything the user mentioned in passing, facts you are uncertain about, anything that might change (current mood, what they had for lunch). Bad memory is worse than no memory — hallucinating facts about a user breaks trust immediately.

Review before storing: run an LLM check if you are uncertain: "Is this fact about the user or just something they mentioned incidentally?" This extra call is worth it.

Production considerations

For multi-user production, replace the JSON file store with a proper database. The memory structure is simple enough for any key-value store (Redis, DynamoDB) or a relational table with a JSONB column (PostgreSQL).

Consider whether to show users their stored memory and let them edit it. For consumer applications, this builds trust. A simple "What do I know about you?" command that returns the facts store is worth adding.

Token budget: a memory summary of 200-400 tokens adds minimal cost but significant continuity. If your memory summaries are growing to 2,000+ tokens, that is a sign you are storing too much. Summarize the summaries.