- Overview
- System Architecture
- Components
- Installation & Deployment
- Configuration
- Usage
- API Reference
- Security Considerations
- Monitoring & Logging
- Troubleshooting
- Development Guide
- Maintenance & Operations
Memory-MCP is a MongoDB-based Model Context Protocol (MCP) server that provides intelligent memory management, semantic caching, and hybrid search capabilities for AI applications. The system integrates with AWS Bedrock for embedding generation, MongoDB for data storage, and Tavily for web search functionality.
- Hybrid Search: Combines vector similarity and keyword matching for comprehensive search results
- AI Memory Management: Stores and retrieves conversational context with intelligent summarization
- Semantic Caching: Caches AI responses for similar queries to improve performance
- Web Search Integration: External web search capabilities through Tavily API
- Multi-Service Architecture: Modular design with separate services for memory, cache, and logging
- Containerized Deployment: Docker-based deployment with orchestration support
- AI application developers requiring persistent memory capabilities
- Organizations building conversational AI systems
- Developers implementing semantic search and caching solutions
- Teams working with MongoDB and AWS Bedrock integration
The Memory-MCP system follows a microservices architecture with the main MCP server coordinating multiple specialized services:
graph TB
%% Client Layer
Client[MCP Client Application]
%% Main Server
MCP[Memory-MCP Server<br/>:8080<br/>FastMCP + HTTP Transport]
%% Microservices Layer
subgraph "Containerized Services Network"
direction TB
MEM[AI Memory Service<br/>:8182<br/>Conversation Management]
CACHE[Semantic Cache Service<br/>:8183<br/>Intelligent Caching]
LOG[Event Logger Service<br/>:8181<br/>Centralized Logging]
end
%% Core Tools Layer
subgraph "MCP Tools"
direction LR
T1[Memory Tools<br/>store_memory<br/>retrieve_memory]
T2[Cache Tools<br/>semantic_cache_response<br/>check_semantic_cache]
T3[Search Tools<br/>hybrid_search<br/>search_web]
end
%% Data Layer
subgraph "MongoDB Atlas"
direction TB
DB1[Memory Collection<br/>Conversations & Context]
DB2[Documents Collection<br/>Vector + Full-text Search]
DB3[Cache Collection<br/>Semantic Query Cache]
end
%% External Services
subgraph "AWS Bedrock"
direction TB
EMB[Titan Embed Text v1<br/>Vector Embeddings]
LLM[Claude Sonnet<br/>Text Generation]
end
subgraph "External APIs"
TAVILY[Tavily Search API<br/>Web Content Retrieval]
end
%% Storage & Logging
subgraph "Local Storage"
LOGS[Log Files<br/>logs/*.log]
RESULTS[Test Results<br/>test_results/*.json]
end
%% Connections with labels
Client -.->|MCP Protocol<br/>HTTP Transport| MCP
MCP --> T1
MCP --> T2
MCP --> T3
T1 -.->|HTTP API| MEM
T2 -.->|HTTP API| CACHE
T3 -.->|Direct Connection| DB2
T3 -.->|Embedding Generation| EMB
T3 -.->|Web Search| TAVILY
MEM -.->|Store/Retrieve<br/>Conversations| DB1
MEM -.->|Context Summarization| LLM
MEM -.->|Similarity Search| EMB
CACHE -.->|Cache Operations| DB3
CACHE -.->|Query Embeddings| EMB
LOG -.->|Write Logs| LOGS
%% All services log to central logger
MCP -.->|Application Events| LOG
MEM -.->|Memory Events| LOG
CACHE -.->|Cache Events| LOG
%% Testing connections
MCP -.->|Test Results| RESULTS
%% Styling
classDef client fill:#e1f5fe,stroke:#01579b,stroke-width:2px
classDef server fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
classDef service fill:#e8f5e8,stroke:#1b5e20,stroke-width:2px
classDef tool fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef database fill:#fce4ec,stroke:#880e4f,stroke-width:2px
classDef external fill:#f1f8e9,stroke:#33691e,stroke-width:2px
classDef storage fill:#fff8e1,stroke:#f57f17,stroke-width:2px
class Client client
class MCP server
class MEM,CACHE,LOG service
class T1,T2,T3 tool
class DB1,DB2,DB3 database
class EMB,LLM,TAVILY external
class LOGS,RESULTS storage
The architecture supports horizontal scaling through Docker containers and network isolation through a dedicated bridge network. The system processes requests through the FastMCP framework, which handles tool registration and HTTP transport.
Purpose: Central orchestration server that exposes MCP tools and coordinates with other services.
Technologies Used:
- FastMCP framework for MCP protocol implementation
- Docker containerization with Python 3.12
- uv for dependency management
- HTTPx for asynchronous HTTP client operations
Core Functionality:
- Tool registration and management
- Request routing to appropriate services
- Response aggregation and formatting
- Error handling and logging coordination
Interactions: Communicates with all other components through HTTP APIs and database connections.
Purpose: Manages conversational memory storage, retrieval, and context summarization.
Key Features:
- Conversation history management
- Semantic similarity search for relevant memories
- Context summarization using AI models
- User-specific memory isolation
Integration Points:
- MongoDB for persistent storage
- AWS Bedrock for embedding generation
- Event logging for audit trails
Purpose: Implements intelligent caching based on semantic similarity rather than exact matches.
Key Features:
- Query similarity detection
- Response caching with TTL management
- Cache hit optimization
- User-specific cache namespacing
Performance Benefits:
- Reduced API calls to expensive AI services
- Improved response times for similar queries
- Cost optimization through intelligent caching
Purpose: Centralized logging and monitoring for all system events.
Capabilities:
- Structured logging with metadata
- Async and sync logging support
- Log retention management
- Service health monitoring
Purpose: Provides database connectivity and operations for all data persistence needs.
Features:
- Connection pooling and management
- Aggregation pipeline support
- Vector search capabilities
- Full-text search integration
Purpose: Handles AI model interactions for embeddings and text generation.
Models Supported:
- Amazon Titan Embed Text v1 for embeddings
- Anthropic Claude for text generation
- Configurable model selection
Tavily Web Search: Provides external web search capabilities with content extraction and summarization.
Software Requirements:
- Docker and Docker Compose
- Python 3.12+ (for local development)
- MongoDB Atlas account or local MongoDB instance
- AWS account with Bedrock access
- Tavily API key for web search
Hardware Requirements:
- Minimum 2GB RAM
- 1GB available disk space
- Network connectivity for external services
Create the required environment files:
Copy .env.<service_name>.example to .env.<service_name> file.
Using Docker Compose (Recommended):
# Clone and navigate to project directory
git clone <repository-url>
cd memory-mcp
# Build and start all services
docker-compose up -d
# Verify deployment
docker-compose ps
docker-compose logs memory-mcp
# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh
# Sync dependencies
uv sync
# Run development server
uv run src/server.py
Core Application Settings:
APP_NAME=memory-mcp # Application identifier
DEBUG=False # Debug mode toggle
PORT=8080 # Server port
Service URLs:
LOGGER_SERVICE_URL=http://event-logger:8181 # Event logging service
AI_MEMORY_SERVICE_URL=http://ai-memory:8182 # Memory management service
SEMANTIC_CACHE_SERVICE_URL=http://semantic-cache:8183 # Semantic caching service
AWS Configuration:
AWS_ACCESS_KEY_ID=your_access_key # AWS access credentials
AWS_SECRET_ACCESS_KEY=your_secret_key # AWS secret key
AWS_REGION=us-east-1 # AWS region
EMBEDDING_MODEL_ID=amazon.titan-embed-text-v1 # Bedrock embedding model
LLM_MODEL_ID=us.anthropic.claude-sonnet-4-20250514-v1:0 # Language model
VECTOR_DIMENSION=1536 # Embedding vector dimension
External APIs:
TAVILY_API_KEY=your_tavily_api_key # Web search API key
Connection String Format:
mongodb+srv://username:password@cluster.mongodb.net/database_name
Required Collections:
documents
: For hybrid search functionalitymemory
: For conversational memory storagecache
: For semantic caching
Index Requirements:
- Vector search index:
documents_vector_search_index
onembedding
field - Text search index:
text
for full-text search capabilities
The docker-compose.yml
file orchestrates multiple services:
services:
memory-mcp:
build: .
ports:
- "8080:8080"
depends_on:
- event-logger
- ai-memory
- semantic-cache
networks:
- mcp-network
networks:
mcp-network:
driver: bridge
Start the MCP server using Docker Compose:
docker-compose up -d
The server will be available at http://localhost:8080/mcp
for MCP protocol connections.
Connect to the MCP server using the FastMCP client:
from fastmcp import Client
import asyncio
async def connect_to_server():
async with Client("http://localhost:8080/mcp") as client:
# List available tools
tools = await client.list_tools()
print(f"Available tools: {[tool.name for tool in tools]}")
# Use a tool
result = await client.call_tool("hybrid_search", {
"connection_string": "mongodb+srv://...",
"database_name": "your_database",
"collection_name": "documents",
"query": "your search query",
"user_id": "user@example.com"
})
asyncio.run(connect_to_server())
Memory Management Workflow:
- Store conversation messages using
store_memory
- Retrieve relevant context using
retrieve_memory
- Cache AI responses using
semantic_cache_response
- Check cache before generating new responses using
check_semantic_cache
Search Workflow:
- Use
hybrid_search
for database queries combining semantic and keyword search - Use
search_web
for external information gathering - Weight search results based on use case requirements
Description: Store a message in AI memory system.
Parameters:
conversation_id
(string): Unique conversation identifiertext
(string): Message content to storemessage_type
(string): Type of message ("human" or "ai")user_id
(string): User identifier for isolationtimestamp
(string, optional): ISO format timestamp
Returns: Dictionary with storage confirmation and metadata.
Example:
result = await client.call_tool("store_memory", {
"conversation_id": "conv_123",
"text": "I'm interested in F1 racing at Marina Bay.",
"message_type": "human",
"user_id": "user@example.com",
"timestamp": "2025-01-09T10:30:00Z"
})
Description: Get relevant AI memory with context and summary.
Parameters:
user_id
(string): User identifiertext
(string): Query text for memory retrieval
Returns: Dictionary containing:
related_conversation
: Relevant conversation historyconversation_summary
: AI-generated summary of contextsimilar_memories
: Related memories from user's history
Example:
result = await client.call_tool("retrieve_memory", {
"user_id": "user@example.com",
"text": "Tell me about F1 racing experiences"
})
Description: Cache AI response for similar queries.
Parameters:
user_id
(string): User identifierquery
(string): Original query textresponse
(string): AI response to cachetimestamp
(string, optional): ISO format timestamp
Returns: Dictionary with cache storage confirmation.
Description: Get cached response for similar query.
Parameters:
user_id
(string): User identifierquery
(string): Query text to search in cache
Returns: Dictionary with cached response if found, empty if no match.
Description: Advanced hybrid search combining vector similarity and keyword matching.
Parameters:
connection_string
(string): MongoDB connection stringdatabase_name
(string): Database namecollection_name
(string): Collection namefulltext_search_field
(string): Field for text searchvector_search_index_name
(string): Vector search index namevector_search_field
(string): Field containing embeddingsquery
(string): Search queryuser_id
(string): User identifier for filteringweight
(float, default: 0.5): Balance between vector (1.0) and text (0.0) searchlimit
(int, default: 10): Maximum number of results
Returns: Dictionary with results
array containing scored documents.
Example:
result = await client.call_tool("hybrid_search", {
"connection_string": "mongodb+srv://...",
"database_name": "my_database",
"collection_name": "documents",
"fulltext_search_field": "text",
"vector_search_index_name": "documents_vector_search_index",
"vector_search_field": "embedding",
"query": "Marina Bay F1 racing experience",
"user_id": "user@example.com",
"weight": 0.7,
"limit": 5
})
Description: Search web using Tavily API.
Parameters:
query
(string): Web search query
Returns: List of strings containing web search results.
Example:
result = await client.call_tool("search_web", {
"query": "Marina Bay Sands F1 circuit Singapore"
})
User Isolation: The system implements user-based isolation through user_id
parameters, ensuring users can only access their own data.
API Key Management:
- Store AWS credentials securely using environment variables
- Rotate API keys regularly
- Use IAM roles for production deployments when possible
Network Security:
- Use Docker networks to isolate services
- Implement HTTPS for production deployments
- Restrict database access to specific IP ranges
At Rest:
- MongoDB Atlas provides encryption at rest
- Use encrypted storage volumes for Docker containers
In Transit:
- Use TLS for all external API communications
- Implement HTTPS for the MCP server endpoint
The system implements comprehensive input validation:
- User ID format validation
- Message type validation (human/ai only)
- Query parameter sanitization
- MongoDB injection prevention
Dependency Management:
- Regular updates of Python dependencies
- Vulnerability scanning with
pip audit
- Use of pinned dependency versions in requirements.txt
Database Security:
- Use of prepared statements and parameterized queries
- MongoDB connection string encryption
- Regular security updates for MongoDB driver
Performance Metrics:
- Tool execution time
- Database query response time
- API call success rates
- Memory usage and cache hit rates
Business Metrics:
- Number of conversations stored
- Search query volume
- Cache effectiveness
- User activity patterns
The integrated event logger provides:
- Structured logging with metadata
- Real-time log streaming
- Log retention management
- Service health monitoring
Log Levels:
DEBUG
: Detailed execution informationINFO
: General operational eventsWARNING
: Potential issuesERROR
: Error conditions requiring attention
Local Logging:
- File-based logging in the
logs/
directory - Automatic log rotation and cleanup
- Console output for development
Remote Logging:
- Centralized logging through event logger service
- Structured JSON format for log analysis
- Async logging to prevent performance impact
Log Analysis:
# View recent logs
docker-compose logs -f memory-mcp
# Check specific service logs
docker-compose logs ai-memory
# Monitor real-time events
tail -f logs/memory-mcp.log
Connection Errors:
Issue: Failed to connect to MongoDB
Solution:
- Verify MongoDB connection string format
- Check network connectivity
- Ensure client IP is whitelisted in MongoDB Atlas
- Validate credentials
Issue: Service unavailable
for dependent services
Solution:
- Check Docker container status: docker-compose ps
- Restart services: docker-compose restart
- Verify network configuration
- Check service health endpoints
Tool Execution Errors:
Issue: Invalid user_id
error
Solution:
- Ensure user_id is provided and non-null
- Use valid email format for user identification
- Check input validation in client code
Issue: Invalid message_type
error
Solution:
- Use only "human" or "ai" for message_type parameter
- Check case sensitivity in parameter values
- Validate input before tool calls
1. Check Service Health:
# Verify all containers are running
docker-compose ps
# Check service logs
docker-compose logs memory-mcp
docker-compose logs ai-memory
docker-compose logs semantic-cache
docker-compose logs event-logger
2. Test Individual Components:
# Test MCP server endpoint
curl http://localhost:8080/health
# Test database connectivity
python -c "import pymongo; client = pymongo.MongoClient('your_connection_string'); print(client.admin.command('ping'))"
# Test AWS Bedrock access
aws bedrock-runtime invoke-model --region us-east-1 --model-id amazon.titan-embed-text-v1 --body '{"inputText":"test"}' response.json
3. Validate Configuration:
# Check environment variables
docker-compose exec memory-mcp env | grep -E "(AWS|TAVILY|MONGO)"
# Verify network connectivity
docker-compose exec memory-mcp ping ai-memory
docker-compose exec memory-mcp ping semantic-cache
Slow Query Performance:
- Check MongoDB indexes are properly created
- Monitor query execution plans
- Optimize search parameters and limits
- Consider database scaling options
High Memory Usage:
- Monitor container resource usage
- Implement result pagination
- Optimize embedding vector dimensions
- Check for memory leaks in long-running processes
Error | Cause | Solution |
---|---|---|
ConnectionFailure |
MongoDB connection issues | Check connection string, network, credentials |
ToolError |
Invalid tool parameters | Validate input parameters against API reference |
HTTPStatusError |
Service communication failure | Check service health, restart if needed |
ClientError |
AWS Bedrock issues | Verify AWS credentials and region settings |
TimeoutError |
Service response timeout | Increase timeout values, check service performance |
memory-mcp/
├── src/
│ ├── server.py # Main MCP server entry point
│ └── core/
│ ├── config.py # Configuration management
│ └── logger.py # Logging utilities
├── tools/ # MCP tool implementations
│ ├── search_tools.py # Search-related tools
│ ├── memory_tools.py # Memory management tools
│ └── cache_tools.py # Caching tools
├── services/ # External service integrations
│ ├── mongodb_service.py # MongoDB operations
│ ├── bedrock_service.py # AWS Bedrock integration
│ └── external/
│ └── tavily_service.py # Tavily web search
├── utils/ # Utility functions
│ ├── validators.py # Input validation
│ └── serializers.py # Data serialization
├── tests/ # Test suite
│ └── test_memory_mcp.py # Comprehensive tests
├── client/ # Client examples
│ └── memory-mcp-client.py # Test client
└── docker-compose.yml # Service orchestration
Prerequisites:
- Python 3.12+
- uv package manager
- Docker for integration testing
Setup Steps:
# Clone repository
git clone <repository-url>
cd memory-mcp
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies
uv sync
# Set up pre-commit hooks (optional)
pre-commit install
# Run development server
uv run src/server.py
Comprehensive Test Suite:
The project includes a comprehensive test suite (tests/test_memory_mcp.py
) that covers:
- Tool functionality testing
- Error handling validation
- Performance and concurrency testing
- Integration testing with external services
Running Tests:
# Run full test suite
pytest tests/ -v
# Run specific test
pytest tests/test_memory_mcp.py::test_hybrid_search -v
# Run with coverage
pytest tests/ --cov=src --cov-report=html
# Run performance tests
pytest tests/ -m performance
Test Results: The test suite automatically generates detailed reports:
- JSON results file:
test_results/mcp_test_results_TIMESTAMP.json
- Summary report:
test_results/mcp_test_summary_TIMESTAMP.txt
Example Test Execution:
# Start services for testing
docker-compose up -d
# Wait for services to be ready
sleep 30
# Run tests
python tests/test_memory_mcp.py
# Or use pytest
pytest tests/ -v -s
Tool Registration Process:
- Create Tool Function: Implement the tool in appropriate module (e.g.,
tools/new_tools.py
)
def register_new_tools(mcp: FastMCP):
@mcp.tool(name="new_tool", description="Tool description")
async def new_tool(param1: str, param2: int) -> Dict[str, Any]:
# Implementation
return {"result": "success"}
- Register in Server: Add registration to
src/server.py
from tools.new_tools import register_new_tools
register_new_tools(mcp)
- Add Tests: Create test cases in
tests/test_memory_mcp.py
@pytest.mark.asyncio
async def test_new_tool(mcp_client, test_config):
result = await mcp_client.call_tool("new_tool", {
"param1": "test_value",
"param2": 42
})
# Assertions
- Update Documentation: Add tool documentation to API Reference section
Code Formatting:
- Use Black for code formatting:
black src/ tools/ services/
- Follow PEP 8 style guidelines
- Use type hints for all function parameters and return values
Error Handling:
- Use try-catch blocks for external service calls
- Log errors with appropriate context
- Return structured error responses
Async Programming:
- Use async/await for I/O operations
- Avoid blocking operations in async functions
- Use asyncio.gather() for concurrent operations
Daily Tasks:
- Monitor service health and logs
- Check Error rates and performance metrics
- Verify backup completion status
Weekly Tasks:
- Review and rotate log files
- Update dependencies with security patches
- Performance optimization review
Monthly Tasks:
- Full system backup verification
- Security audit and vulnerability scanning
- Capacity planning and scaling decisions
MongoDB Backup:
# Create database backup
mongodump --uri="mongodb+srv://..." --out=backup_$(date +%Y%m%d)
# Restore from backup
mongorestore --uri="mongodb+srv://..." backup_YYYYMMDD/
Configuration Backup:
# Backup environment configuration
tar -czf config_backup_$(date +%Y%m%d).tar.gz environment/ .env docker-compose.yml
# Backup application code
git archive --format=tar.gz HEAD > app_backup_$(date +%Y%m%d).tar.gz
Dependency Updates:
# Check for updates
uv sync --upgrade
# Update specific package
uv add "package@latest"
# Security updates
uv audit
Container Updates:
# Pull latest base images
docker-compose pull
# Rebuild with updates
docker-compose build --no-cache
# Rolling update
docker-compose up -d --no-deps memory-mcp
Database Optimization:
- Monitor index usage and performance
- Optimize aggregation pipelines
- Implement connection pooling
- Regular database maintenance tasks
Application Optimization:
- Profile memory usage and optimize
- Cache frequently accessed data
- Implement request rate limiting
- Monitor and tune timeout values
Infrastructure Scaling:
- Horizontal scaling through container replication
- Load balancing for high availability
- Resource monitoring and auto-scaling
- Database sharding for large datasets