How Agents Remember: A Survey of AI Memory Systems
How Agents Remember: A Survey of AI Memory Systems
A comprehensive survey of how AI agents handle memory and continuity across sessions, from production frameworks to cutting-edge research.
Introduction
One of the most fundamental challenges in building long-running AI agents is memory—how do systems that are inherently stateless maintain context, learn from past interactions, and build persistent relationships with users? While Large Language Models (LLMs) excel at processing information within their context window, they wake up fresh each session, forgetting everything that came before.
This challenge has sparked an explosion of innovation in agent memory systems. From production frameworks handling billions of conversations to academic research exploring novel memory architectures, the field is rapidly evolving. This survey examines the current landscape of AI agent memory systems, focusing on what's actually working in production versus theoretical approaches.
The Memory Challenge
Traditional LLMs are stateless—each interaction is essentially a fresh start. They have parametric knowledge baked into their weights from training, but no ability to learn or remember new information from conversations. This creates several critical problems:
- Session discontinuity: Agents can't reference previous conversations
- Learning inability: No way to improve from user feedback over time
- Context loss: Important information gets pushed out of the context window
- Personalization barriers: Cannot adapt to individual user preferences
The solution requires building memory systems that can persist information across sessions while remaining fast, accurate, and cost-effective.
Memory Architecture Taxonomy
Recent academic work, particularly the comprehensive survey "Memory in the Age of AI Agents" (Hu et al., 2026), provides a unified framework for understanding agent memory through three lenses:
Forms of Memory
Token-level Memory: Information stored as raw tokens that can be directly fed into the LLM context window. Examples include conversation histories, markdown files, and text summaries.
Parametric Memory: Information encoded in model weights through fine-tuning or parameter updates. Less common in production due to computational costs.
Latent Memory: Structured representations like embeddings, knowledge graphs, or learned states that require processing before use.
Functions of Memory
Working Memory: Contents of the current context window—the immediate conversation state.
Factual Memory: Static facts and knowledge that don't change frequently (semantic memory in cognitive science terms).
Experiential Memory: Records of past interactions, decisions, and outcomes (episodic memory in cognitive science terms).
Memory Dynamics
Formation: How memories are created from interactions Evolution: How memories are updated, merged, or refined over time Retrieval: How relevant memories are found and incorporated into responses
Production Agent Frameworks
Letta (formerly MemGPT)
Architecture: Letta implements a hierarchical memory system inspired by operating system design:
- Message Buffer: Recent conversation history (working memory)
- Core Memory: Agent-managed key facts kept in context
- Recall Memory: Searchable conversation history
- Archival Memory: Long-term knowledge storage
Key Innovation: The agent actively manages what stays in its limited context window versus what gets stored externally. Through function calls, the agent can edit its own core memory, search past conversations, and retrieve relevant archival information.
Trade-offs:
- ✅ Unlimited memory capacity within fixed context windows
- ✅ Agent has full control over memory management
- ❌ Complex implementation requiring careful prompt engineering
- ❌ Risk of memory management overhead affecting performance
Benchmark Results: Letta agents running on GPT-4o-mini achieved 74% accuracy on the LoCoMo benchmark by simply storing conversation histories in files—outperforming specialized memory tools.
LangChain/LangGraph
Architecture: Modular approach with pluggable memory components:
- Conversation Buffer Memory: Simple list-based chat history
- Conversation Summary Memory: Periodic summarization of old messages
- Vector Store Memory: Semantic search over past interactions
- Entity Memory: Tracking specific entities mentioned across conversations
Key Innovation: Flexibility to mix and match memory strategies based on use case requirements.
Trade-offs:
- ✅ Highly modular and customizable
- ✅ Strong ecosystem integration
- ❌ Complexity in choosing the right memory strategy
- ❌ No built-in memory optimization
CrewAI
Architecture: Multi-agent focused with shared memory pools:
- Individual Agent Memory: Each agent maintains its own memory
- Shared Team Memory: Cross-agent information sharing
- Task Memory: Context specific to particular workflows
Key Innovation: Memory designed for multi-agent collaboration scenarios.
Trade-offs:
- ✅ Natural multi-agent memory sharing
- ✅ Task-oriented memory organization
- ❌ Limited single-agent memory sophistication
- ❌ Potential memory conflicts between agents
Real-World Deployments
ChatGPT Memory
Architecture: OpenAI's approach combines automatic and explicit memory:
- Automatic Learning: System extracts key facts from conversations
- User Control: Users can explicitly ask ChatGPT to remember or forget information
- Session Persistence: Memories persist across all future conversations
- Privacy Controls: Memory can be disabled or cleared entirely
Key Innovation: Balances automatic memory formation with user control and privacy.
Trade-offs:
- ✅ Seamless user experience
- ✅ Strong privacy controls
- ❌ Limited transparency into what gets remembered
- ❌ No access for developers to similar capabilities
Character.ai
Architecture: Character-specific memory designed for consistent personas:
- Character Consistency: Maintains stable personality and background
- Relationship Memory: Tracks evolving relationship with each user
- Episodic Continuity: References past conversations naturally
Key Innovation: Memory optimized for character roleplay and relationship building.
Trade-offs:
- ✅ Excellent character consistency
- ✅ Natural conversation flow
- ❌ Proprietary system with limited transparency
- ❌ Focused on entertainment rather than productivity
Open Source Memory Tools
Mem0
Architecture: Universal memory layer with multiple storage backends:
- Graph Memory: Knowledge graph for relationship tracking
- Vector Memory: Semantic similarity search
- Fact Extraction: LLM-powered information distillation
- Multi-modal Support: Text, image, and other data types
Funding/Traction: Raised $24M Series A in 2025, 41K GitHub stars
Key Innovation: Pluggable memory that works across different applications and agents.
Trade-offs:
- ✅ Universal compatibility
- ✅ Multiple storage backend options
- ❌ Additional infrastructure complexity
- ❌ Potential vendor lock-in concerns
Zep
Architecture: Temporal knowledge graph approach:
- Conversational Memory: Session-based interaction storage
- Knowledge Graph: Entity and relationship tracking over time
- Temporal Awareness: Understanding how information changes
- Sub-200ms Latency: Optimized for production performance
Key Innovation: Time-aware memory that tracks how facts evolve.
Trade-offs:
- ✅ Temporal relationship tracking
- ✅ Production-grade performance
- ❌ Complex graph management
- ❌ Requires specialized knowledge graph expertise
Motorhead
Architecture: Simple Redis-based conversation memory:
- Session Storage: Redis-backed conversation persistence
- Automatic Summarization: Background conversation compression
- REST API: Easy integration with existing systems
Key Innovation: Simplicity and reliability over sophistication.
Trade-offs:
- ✅ Simple to deploy and maintain
- ✅ Battle-tested Redis backend
- ❌ Limited memory sophistication
- ❌ Basic retrieval capabilities
Novel Approaches and Research Frontiers
Memory Graphs and Knowledge Representation
Recent research explores representing memory as dynamic knowledge graphs rather than static retrievals:
AriGraph (arxiv:2407.04363): Combines semantic and episodic memories in a graph structure that agents update as they explore environments.
CAST (arxiv:2602.06051): Character-and-Scene Episodic Memory preserves full episodic context without the fragility of traditional knowledge graph extraction.
Self-Reflection and Memory Consolidation
A-Mem (arxiv:2502.12110): Implements agentic memory where the system reflects on experiences and actively decides what to remember, update, or forget.
Hindsight Memory (arxiv:2512.12818): Focuses on building memory that retains, recalls, and reflects on past decisions to improve future performance.
Forgetting Mechanisms
A critical but underexplored area is how agents should forget outdated information:
- Time-based Decay: Automatically aging out old memories
- Importance Scoring: Weighting memories by relevance/frequency
- Conflict Resolution: Handling contradictory information
- User-Directed Forgetting: Explicit deletion controls
Memory Importance Scoring
Memoria (arxiv:2512.12686): Introduces importance weighting for conversational memories, ensuring the most relevant information gets prioritized during retrieval.
Multiple Memory Systems (arxiv:2508.15294): Explores hierarchical importance across different memory types, with automatic promotion/demotion between memory layers.
File-Based Memory Systems
Given Bramble's file-based approach (daily notes + MEMORY.md), several insights emerge from the research:
Advantages of File-Based Memory
1. Simplicity: Plain text files are universally readable and debuggable 2. Transparency: Users can directly inspect and edit memory contents 3. Version Control: Git integration for memory change tracking 4. Tool Compatibility: Works with existing filesystem tools 5. Low Latency: No database queries or vector searches required
Letta's Filesystem Findings
Letta's recent research shows that simple filesystem operations can outperform sophisticated memory tools. Their filesystem-attached agents achieved 74% accuracy on memory benchmarks using just:
grepfor text matchingsearch_filesfor semantic searchopen/closefor file operations
This suggests that agent capability matters more than memory sophistication. Agents are highly effective with filesystem tools due to extensive training on coding tasks.
Best Practices for File-Based Systems
Based on the research, optimal file-based memory systems should:
1. Separate Concerns:
- Daily logs for raw chronological data
- MEMORY.md for curated long-term insights
- Separate files for different memory types (facts vs. procedures vs. preferences)
2. Support Retrieval Patterns:
- Consistent markdown structure for easy parsing
- Searchable metadata (dates, tags, importance scores)
- Clear section headers for semantic chunks
3. Enable Memory Management:
- Regular consolidation from daily files to long-term memory
- Conflict detection between old and new information
- Importance-based retention policies
4. Maintain Performance:
- Reasonable file sizes to avoid context window bloat
- Efficient search patterns (grep-friendly formatting)
- Lazy loading of memory content as needed
Trade-offs and Production Considerations
Latency vs. Accuracy
Vector Search: High accuracy, ~100-500ms retrieval latency Text Search: Lower accuracy, ~1-10ms retrieval latency File-based: Variable accuracy, ~1-50ms depending on file size Graph Queries: High relationship accuracy, ~50-200ms latency
Cost vs. Performance
Embedding Storage: $0.10-1.00 per million tokens stored Database Operations: $0.001-0.01 per query File Operations: Essentially free after storage LLM Memory Processing: $0.001-0.10 per memory update
Privacy vs. Functionality
Local Storage: Maximum privacy, limited sharing/sync Cloud Vector DBs: Rich functionality, privacy concerns Hybrid Approaches: Encrypted cloud storage with local processing
Accuracy vs. Forgetting
Perfect Retention: Maximum accuracy, storage/relevance issues Aggressive Forgetting: Clean memory, risk of losing important information Smart Consolidation: Balanced approach, complex implementation
Benchmarks and Evaluation
Current Benchmarks
LoCoMo: 500 multi-session conversations testing factual recall across long dialogues LongMemEval: Tests information extraction and temporal reasoning up to 1.5M tokens MemoryBench: Focuses on continual learning from user feedback Deep Memory Retrieval: 500 conversations with 5 sessions each, up to 12 messages per session
Benchmark Limitations
Current benchmarks primarily test retrieval accuracy rather than memory utility. They don't capture:
- Memory management overhead
- Real-world conversation dynamics
- User satisfaction with memory behavior
- Long-term memory evolution
Better Evaluation Approaches
Task-based Evaluation: How well do memory-enabled agents perform on complex, multi-session tasks? User Studies: Do users prefer agents with different memory characteristics? Production Metrics: Conversion rates, session lengths, user retention with memory-enabled systems
Future Research Directions
Multimodal Memory
As agents handle images, audio, and video, memory systems need to store and retrieve across modalities:
- Visual memory for images and scenes
- Audio memory for conversations and sounds
- Cross-modal retrieval (describe an image from text memory)
Multi-Agent Memory
How do multiple agents share and coordinate memory?
- Shared knowledge bases
- Memory conflict resolution
- Privacy boundaries between agents
- Collaborative memory formation
Automated Memory Architecture
Instead of manually designing memory systems, can we learn optimal memory architectures?
- Reinforcement learning for memory management policies
- Neural architecture search for memory structures
- Automatic memory type discovery
Trustworthy Memory
As memory becomes more important, ensuring its reliability becomes critical:
- Memory hallucination detection
- Source attribution for remembered facts
- Confidence scoring for memory retrieval
- Adversarial robustness of memory systems
Implications for OpenClaw/Bramble
Based on this survey, several recommendations emerge for Bramble's file-based memory approach:
Strengths to Leverage
1. File-based systems are competitive: Letta's research shows filesystem approaches can outperform specialized memory tools 2. Simplicity aids reliability: Plain text is debuggable and transparent 3. Agent capability matters most: Focus on how Bramble uses memory tools rather than the storage mechanism
Areas for Enhancement
1. Memory Consolidation: Implement regular review/synthesis of daily notes into MEMORY.md 2. Search Capabilities: Add semantic search over markdown files using embeddings 3. Memory Management: Build explicit ADD/UPDATE/DELETE/NOOP operations for memories 4. Importance Scoring: Weight memories by frequency, recency, and user feedback
Architectural Recommendations
1. Hybrid Approach: Keep files as the primary storage but add search indices 2. Memory Types: Separate factual (semantic), experiential (episodic), and procedural memories into different files or sections 3. Temporal Organization: Use timestamps and version control to track memory evolution 4. Agent Control: Let Bramble actively manage what gets remembered vs. forgotten
Conclusion
The landscape of AI agent memory is rapidly evolving, with production systems settling on pragmatic solutions while research explores more sophisticated approaches. The key insight from current work is that memory utility depends more on how agents use memory tools than on the sophistication of the underlying storage.
File-based systems like Bramble's are not just viable—they may actually be preferable for many use cases due to their simplicity, transparency, and compatibility with existing tooling. The challenge is not in building the most sophisticated memory architecture, but in designing memory systems that agents can effectively use to provide better experiences for users.
As the field continues to mature, we expect to see convergence on hybrid approaches that combine the simplicity of file-based storage with the power of semantic search and graph relationships. The agents that succeed will be those that remember not just more information, but the right information at the right time.
This survey was compiled from academic papers, production system documentation, and open source implementations as of February 2026. The field is rapidly evolving—expect significant developments in multimodal memory, automated architecture design, and trustworthy memory systems in the coming months.
References
- Hu, Y. et al. (2026). Memory in the Age of AI Agents. arXiv:2512.13564
- Letta. (2026). Benchmarking AI Agent Memory: Is a Filesystem All You Need?
- Chhikara, P. et al. (2025). Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory. arXiv:2504.19413
- Various arXiv papers on episodic memory, knowledge graphs, and memory architectures (2024-2026)
- Production system documentation from OpenAI, Character.ai, Letta, Zep, and Mem0
- GitHub repositories and open source implementations of agent memory systems