How AI Memory & Context Works in MemoryX — Save, Extract, Recall

Zhenghao Chen April 9, 2026

A question we hear a lot: how does MemoryX actually implement cross-platform AI memory? How are memories extracted from conversations? And how are they automatically injected the next time you ask a question?

This post is not a theoretical discussion. It is a technical deep-dive based on our actual codebase. Every detail described here has a corresponding implementation. (Read the backstory: Why I Built MemoryX.)

Architecture Overview: Capture → Structure → Recall

MemoryX AI memory architecture: Capture, Structure, Recall — three-step closed loop for saving AI conversations and auto-recalling context

The MemoryX memory system works in three steps:

Capture: Automatically save your conversations on ChatGPT, Claude, and Gemini
Structure: Use an LLM to extract structured memories from conversations, deduplicate, and store them
Recall: When you ask your next question, automatically search for relevant memories and inject them into the conversation context

These three steps form a complete closed loop. Something you discussed in Claude can be automatically recalled the next time you ask a question in ChatGPT.

Step 1: Save Conversations

The MemoryX Chrome extension automatically monitors your conversations on AI platforms, checking for content changes every few seconds and saving them as complete conversation records.

There are two saving modes:

Auto-save: Triggers every 1.5-5 seconds, incrementally saving whenever new messages are detected
Manual save: The user clicks the save button explicitly

What gets saved is the full conversation in Markdown, including both user messages and AI responses. This is the foundation for everything that follows.

Step 2: Extract Memories from Conversations

AI memory extraction and deduplication flow in MemoryX — extracting structured context from ChatGPT, Claude, and Gemini conversations

After saving a conversation, MemoryX uses an LLM to extract structured memories from it. This is the most critical part of the entire system.

What to Extract, What to Skip

We defined 7 extraction categories:

MemoryX 7 AI memory extraction categories: preferences, personal info, plans, activities, health, career, and other context

Personal preferences: Likes, habits, communication style
Important personal information: Name, location, key dates
Plans and intentions: Goals, projects, upcoming events
Activity and service preferences: Dining, travel, shopping habits
Health information: Dietary restrictions, fitness routines
Career information: Job title, tech stack, work style
Other: Interests, opinions, attitudes

Equally important is what not to extract. This is where we spent a significant amount of time fine-tuning. It comes down to two core criteria:

Distinguish lasting preferences from one-off instructions. Only extract information that reflects who the user is. For example, "prefers minimalist design" is a lasting preference and should be extracted. But "translate this paragraph for me" is a one-time request and should not. A simple test: if this piece of information would not help another AI understand the user better, it should not be extracted.

Distinguish the user's own information from referenced content. When the user shares a link, quotes an article, or pastes example code, the referenced content should not be extracted as facts about the user.

Extraction Output Format

Each extracted memory is structured, containing type, text, tags, bilingual keywords, and a confidence score:

{
  "type": "preference",
  "text": "Prefers coding in TypeScript",
  "tags": ["TypeScript", "programming language"],
  "keywords": ["TypeScript", "TS", "programming", "编程"],
  "confidence": 0.9
}

A few key design decisions:

type is one of four categories: skill, preference, objective, or context
Keywords are bilingual (both English and Chinese), so relevant memories can be found regardless of what language the user types in
confidence scores indicate certainty -- explicitly stated information gets high confidence, reasonable inferences get moderate confidence
Memory text matches the user's language: if the user chats in English, the extracted memory will be in English

Reasonable Inference

Our system allows reasonable inference. If a user is asking about React hooks, we infer they use React and extract that at moderate confidence. This is inspired by mem0's design philosophy -- extract broadly, but tag confidence levels, so the recall stage can sort and filter accordingly.

Merge, Don't Fragment

We explicitly instruct the LLM to merge related information into a single, complete memory rather than splitting it into fragments. For example, if a user discusses logo design and mentions style, colors, letters, and references, that should be one memory, not four.

The goal is 3-5 high-quality memories per conversation, not 10+ fragments.

Incremental Extraction

If the same conversation continues, we do not re-extract from the beginning every time. The system tracks "last extracted up to message N" and only processes new messages:

Incremental AI memory extraction timeline — only processing new messages to avoid duplicate extraction in ongoing conversations

Already-processed messages are passed to the LLM as background context, but the LLM only extracts from new messages, avoiding duplicates.

Auto vs. Manual Extraction

Auto-extraction uses standard confidence thresholds. When the user manually clicks the "Extract Memory" button, the system becomes more aggressive -- lowering thresholds, broadening the extraction scope, and mining as much valuable information as possible from the conversation. A manual trigger signals that the user considers this conversation valuable, and the system should honor that intent.

30-Second Stability Window

Extraction does not trigger in real-time. After the last new message, the system waits 30 seconds. If another message arrives within those 30 seconds, the timer resets. This prevents frequent extraction while a conversation is actively ongoing:

T=0s   New message → Schedule extraction in 30s
T=3s   New message → Cancel previous, reschedule in 30s
T=8s   New message → Cancel previous, reschedule in 30s
T=38s  30s of silence → Extract all new messages from T=0~8s

Step 3: Semantic Deduplication

After memories are extracted, there is one more gate before storage: semantic deduplication. This prevents duplicate memories from being saved.

Candidate Search

First, BM25 keyword search finds existing memories that might be duplicates, narrowing down the candidate set.

LLM Decision

Then the LLM compares each new memory against the candidates and makes one of four decisions:

AI memory deduplication decision flow: ADD, UPDATE, NOOP, or DELETE — semantic dedup for saved AI conversations

To ensure consistent judgments, the LLM uses a very low temperature parameter in this step, minimizing randomness.

Safety Policy

One important design principle: AI cannot automatically delete a user's memories. In v1.0, even if the LLM determines a DELETE is appropriate, we opt for a more conservative approach. UPDATE operations preserve the old text (old_text), so changes can be reverted if something goes wrong.

Action	Behavior	Safety
ADD	Save directly	No risk
UPDATE	Update content + preserve old version	Reversible
DELETE	No auto-deletion in v1.0	Zero risk
NOOP	No action taken	No risk

Step 4: Auto-Recall

Once saving and extraction are done, memories sit quietly in storage. What truly makes them valuable is recall -- automatically injecting relevant memories into the conversation context the next time you ask a question.

Embedding Search + BM25 Fallback

When the user types a message and hits send on an AI platform, MemoryX intercepts the send action and performs a memory search first.

The core search logic is embedding vector search first, BM25 keyword search as fallback.

Embedding search converts both the user's input and all stored memories into high-dimensional vectors, then ranks them by cosine similarity to find the most semantically relevant memories. We set a similarity threshold so only truly relevant results are returned.

The advantage of embedding search is semantic understanding:

BM25: "likes minimalist design" vs. "pursues clean aesthetics" → No match (no shared keywords)
Embedding: "likes minimalist design" vs. "pursues clean aesthetics" → Match (semantically close)

Search results also go through a round of result deduplication -- if two results are highly similar semantically, only the higher-scoring one is kept, preventing redundant information from being injected.

BM25 search serves as the fallback. When the Embedding API is unavailable (network errors, etc.), the system automatically switches to local keyword search. While less precise, it ensures the system remains functional under offline or error conditions.

Intercept and Inject

After the memory search completes, MemoryX appends the results to the end of the user's message. This process is transparent to the user.

Technically, we intercept the user's send event during the DOM capture phase -- this ensures MemoryX can intervene before frameworks like React process the event.

The interception flow:

User presses Enter or clicks the send button
MemoryX blocks the original send event
Reads the user's input → sends it to the service worker for memory search
Relevant memories found → appended to the end of the user's message
Re-triggers the send (preferring the send button click, which is more reliable)

The injected format looks like this:

---
[MemoryX Context · 3 memories]
- [preference·Mar] Prefers coding in TypeScript
- [skill·Feb] Uses Claude Code and Cursor as development tools
- [objective·Mar] Currently developing a Chrome extension project
(Above is user background info. Answer the question directly; refer to context only when relevant.)
---

Each memory includes a type label and a time annotation. Type labels automatically switch between English and Chinese based on the user's input language. To avoid overly long context, the system caps the number of injected memories, keeping only the most relevant ones.

Usage Statistics

Every time a memory is recalled and used, the system asynchronously updates usage statistics (usageCount and lastUsedAt) in the background, without blocking the user's send flow. This data can be used in the future to optimize recall ranking and identify memories that are no longer relevant.

Step 5: Conversation Summary

Beyond extracting structured memories, MemoryX also supports generating conversation summaries. The purpose of summaries is to help users quickly resume a prior conversation in a different AI tool.

The summary system automatically detects the primary language of the conversation and generates the summary in the matching language -- Chinese conversations produce Chinese summaries, English conversations produce English summaries.

The summary structure is standardized, designed for handoff:

User / Background: Inferred user identity and current situation
Project / Topic: One-line summary of the core issue
Current Status: Where the conversation left off and what conclusions were reached
Key Information: Facts, decisions, and plans organized by topic
Open Items: Unresolved questions
Continuation Point: Where the next AI can pick up

This is not a generic "conversation summary." It is a handoff document -- when you switch from ChatGPT to Claude, Claude can understand the full context in seconds.

Key Differences from Alternatives

After researching mem0 (the 24K-star open-source benchmark), Letta/MemGPT, Zep / Graphiti, and others, we made several differentiated choices:

Dimension	mem0	MemoryX
Extraction timing	Immediate, per message	Batch extraction after 30s of stability
LLM calls	1 extraction + 1 check per fact	Extraction + check combined into 1 call
Auto-delete	Supported	Disabled (conservative v1.0 policy)
Update protection	None	Preserves old_text for rollback
Embedding failure	No fallback	Automatic BM25 fallback

The guiding principle: be conservative with memory safety, aggressive with extraction quality, and always have a fallback for reliability.

Conclusion

The MemoryX memory system is not a simple "save and search." It is a complete pipeline:

Capture conversation → 30s stability window → Incremental extraction → Semantic dedup → Vector + keyword search → Auto-inject

Every step has a clear design rationale, and every step has a corresponding fallback strategy. We do not aim to be optimal at each individual step, but we ensure the entire pipeline runs reliably under all edge cases.

This is how we actually built it -- not theory.

Try It Out

If you switch between multiple AI platforms and want all of them to "remember you," give MemoryX a try.

Install the Chrome extension: Chrome Web Store
Join our Discord community to share your use cases and feedback: discord.gg/5Bfd7ryKFM

Ready to try MemoryX?

Stop repeating yourself across AI tools. Let your context follow you.

Install from Chrome Web Store