Skip to content

Self-Hosting

Momo is a self-hostable AI memory system designed to run as a single binary with environment-variable-based configuration.


graph TD
    subgraph MomoServer["Momo Server"]
        subgraph API["API Layer"]
            REST["REST API (v1)<br/>Documents · Memories · Search"]
            Admin["Admin API<br/>(authed)"]
        end
        subgraph Services["Services Layer"]
            SearchSvc["SearchService"]
            MemorySvc["MemoryService"]
            Pipeline["ProcessingPipeline"]
            Forgetting["ForgettingManager"]
            Decay["EpisodeDecay"]
            Profile["ProfileRefresh"]
        end
        subgraph Core["Core Modules"]
            Intelligence["Intelligence<br/>Inference · Contradict · Filter"]
            Processing["Processing<br/>Extract · Chunk · Embed"]
            Embeddings["Embeddings<br/>FastEmbed or API · Reranker"]
        end
        subgraph DB["Database Layer (LibSQL / Turso)"]
            Documents["Documents"]
            Chunks["Chunks"]
            Memories["Memories"]
            Vectors["Vectors"]
            Relationships["Relationships"]
        end
        subgraph Providers["External Providers"]
            OCR["OCR<br/>Tesseract or API"]
            Transcription["Transcription<br/>Whisper or API"]
            LLM["LLM Provider<br/>OpenAI / Ollama /<br/>OpenRouter / Local"]
        end
        REST & Admin --> Services
        Services --> Intelligence & Processing & Embeddings
        Processing --> DB
    end

  • Rust 1.75+ (if building from source)
  • Tesseract OCR (optional): For text extraction from images and PDFs.
    • macOS: brew install tesseract
    • Ubuntu/Debian: sudo apt-get install tesseract-ocr tesseract-ocr-eng
  • LLM API Key (optional): Required for advanced features like contradiction detection, query rewriting, and memory inference.

Terminal window
git clone https://github.com/momomemory/momo.git
cd momo
cargo build --release

The compiled binary will be located at ./target/release/momo.

Momo is available as a pre-built image on GitHub Container Registry (GHCR).

Terminal window
# One-command setup (recommended)
docker run --name momo -d --restart unless-stopped -p 3000:3000 -e MOMO_API_KEYS=dev-key -v momo-data:/data ghcr.io/momomemory/momo:latest
Terminal window
# Follow logs
docker logs -f momo
Terminal window
# Stop and remove container (data remains in momo-data volume)
docker stop momo && docker rm momo

Note: The /data volume stores the database. Using the named volume momo-data keeps data across container restarts/redeploys.

The monorepo includes a justfile for common tasks:

Terminal window
# Development server (backend + frontend with hot reload)
just dev
# Backend only (requires cargo-watch)
just dev-backend
# Frontend only (requires Bun)
just dev-frontend
# Debug/trace logging
just dev-debug # RUST_LOG=momo=debug
just dev-trace # RUST_LOG=momo=trace
# Build release binary
just build-release
# Run tests
just test
# Lint and format
just fmt
just lint
# Full CI check
just ci

Momo is configured entirely via environment variables. When running, three interfaces are available:

InterfacePathDescription
Web Console/Built-in Preact frontend for browsing memories and documents
REST API/api/v1Full REST API for integration
MCP/mcp (configurable)Model Context Protocol endpoint
Terminal window
# Running with default settings (creates momo.db in current directory)
./target/release/momo
# Running with custom configuration
DATABASE_URL=file:my-memory.db MOMO_PORT=8080 ./target/release/momo

After starting, open http://localhost:3000 to access the web console.


Momo also ships an embedded C FFI in momo/ffi for applications that want to link the engine directly instead of talking to the HTTP server.

Build it from momo/:

Terminal window
cargo build -p momo-ffi

This produces:

  • target/debug/libmomo_ffi.dylib
  • target/debug/libmomo_ffi.a
  • ffi/include/momo.h

The included C example lives at momo/ffi/examples/c and shows the minimal lifecycle:

  1. Create an engine with momo_engine_new
  2. Call JSON APIs such as momo_engine_create_memory_json
  3. Free returned strings with momo_string_free
  4. Free the engine with momo_engine_free

If you are using the FFI instead of REST, see Embedded C FFI Reference for exported functions, JSON request/response shapes, worker behavior, and loader-path notes.


Momo follows a provider/model string format for external services (Embeddings, LLM, OCR, Transcription).

For a complete reference of all environment variables organized by concern, see Configuration Reference.

If you are embedding Momo through the C FFI, the engine uses this same configuration when momo_engine_new is called with config_json = NULL.

VariableDescriptionDefault
MOMO_HOSTBind address0.0.0.0
MOMO_PORTListen port3000
MOMO_API_KEYSComma-separated API keys for authentication (required for protected API routes)(None)
VariableDescriptionDefault
MOMO_MCP_ENABLEDEnable the built-in MCP server routestrue
MOMO_MCP_PATHPath for streamable HTTP MCP endpoint/mcp
MOMO_MCP_REQUIRE_AUTHRequire Bearer auth for MCP requeststrue
MOMO_MCP_DEFAULT_CONTAINER_TAGFallback project/container tag when none provideddefault
MOMO_MCP_PROJECT_HEADERHeader used for project scoping (Supermemory compatible)x-sm-project
MOMO_MCP_PUBLIC_URLOptional public base URL used in OAuth discovery responses(None)
MOMO_MCP_AUTHORIZATION_SERVEROptional OAuth issuer URL for discovery responses(None)

Notes:

  • MCP auth keys come from MOMO_API_KEYS.
  • When MOMO_MCP_REQUIRE_AUTH=true and no API keys are configured, MCP requests return 401 Unauthorized.
  • Full protocol usage and manual examples are documented in MCP Guide.
VariableDescriptionDefault
DATABASE_URLSQLite/LibSQL path or Turso URLfile:momo.db
DATABASE_AUTH_TOKENAuth token for Turso cloud DB(None)
DATABASE_LOCAL_PATHLocal replica path for remote DB(None)

Local (FastEmbed):

  • EMBEDDING_MODEL: Model name (default: BAAI/bge-small-en-v1.5)
  • EMBEDDING_DIMENSIONS: Vector dimensions (default: 384)
  • EMBEDDING_BATCH_SIZE: Batch size (default: 256)

External API:

  • EMBEDDING_MODEL: Use provider/model (e.g., openai/text-embedding-3-small)
  • EMBEDDING_API_KEY: API key for the provider
  • EMBEDDING_BASE_URL: Custom base URL
  • EMBEDDING_TIMEOUT: Request timeout in seconds (default: 30)
  • EMBEDDING_MAX_RETRIES: Max retry attempts (default: 3)
  • EMBEDDING_RATE_LIMIT: Requests per second (optional)
VariableDescriptionDefault
CHUNK_SIZEChunk size in tokens512
CHUNK_OVERLAPOverlap between chunks50

Note: File uploads are limited to 25MB (hardcoded).

VariableDescriptionDefault
TRANSCRIPTION_MODELModel (e.g., local/whisper-small or openai/whisper-1)local/whisper-small
TRANSCRIPTION_API_KEYAPI key for cloud providers(None)
TRANSCRIPTION_BASE_URLCustom base URL(None)
TRANSCRIPTION_TIMEOUTTimeout in seconds300
TRANSCRIPTION_MAX_FILE_SIZEMax file size in bytes104857600 (100MB)
TRANSCRIPTION_MAX_DURATIONMax duration in seconds7200 (2h)
VariableDescriptionDefault
EPISODE_DECAY_DAYSHalf-life for episode decay30.0
EPISODE_DECAY_FACTORDecay multiplier per period0.9
EPISODE_DECAY_THRESHOLDBelow this, candidates for forgetting0.3 (0.0-1.0)
EPISODE_FORGET_GRACE_DAYSGrace period before permanent forget7
FORGETTING_CHECK_INTERVALInterval in seconds3600
ENABLE_INFERENCESEnable background inference enginefalse
INFERENCE_INTERVAL_SECSInference run interval86400 (24h)
INFERENCE_CONFIDENCE_THRESHOLDMin confidence for inferred memories0.7
INFERENCE_MAX_PER_RUNMax inferences per cycle50
  • RERANK_ENABLED: Enable reranking (opt-in) (default: false)
  • RERANK_MODEL: Reranker model (default: bge-reranker-base)
  • LLM_MODEL: Model (format: provider/model, e.g., openai/gpt-4o-mini)
  • LLM_API_KEY: API key
  • ENABLE_CONTRADICTION_DETECTION: Enable contradiction logic (default: false)
  • ENABLE_QUERY_REWRITE: Enable query expansion (default: false)
  • ENABLE_AUTO_RELATIONS: Auto-detect relationships (default: true)
  • OCR_MODEL: OCR provider (default: local/tesseract)
  • OCR_LANGUAGES: Comma-separated language codes (default: eng)
  • OCR_MAX_DIMENSION: Max image dimension (default: 4096)
  • RUST_LOG: Logging level (default: momo=info,tower_http=debug)

ModelDimensionsQualitySpeed
BAAI/bge-small-en-v1.5 (default)384GoodFast
BAAI/bge-base-en-v1.5768BetterMedium
BAAI/bge-large-en-v1.51024BestSlower
all-MiniLM-L6-v2384GoodFast
nomic-embed-text-v1.5768BetterMedium
ProviderExample ModelDefault Base URL
OpenAIopenai/text-embedding-3-smallhttps://api.openai.com/v1
OpenRouteropenrouter/openai/text-embedding-3-smallhttps://openrouter.ai/api/v1
Ollamaollama/nomic-embed-texthttp://localhost:11434/v1
LM Studiolmstudio/bge-small-en-v1.5http://localhost:1234/v1
  • local/tesseract: Local Tesseract (default)
  • mistral/pixtral-12b: Mistral OCR API
  • deepseek/deepseek-vl: DeepSeek OCR API
  • openai/gpt-4o: OpenAI Vision API
  • local/whisper-small: Local Whisper (default)
  • openai/whisper-1: OpenAI Whisper API

Momo automatically detects and processes:

  • Text: Plain text, Markdown, HTML.
  • Documents: PDF, DOCX, XLSX, CSV.
  • Web: URLs (scrapes page content).
  • Images: JPEG, PNG, WebP, TIFF, BMP (via OCR).
  • Media: Audio (MP3, WAV, M4A) and Video (MP4, WebM, AVI, MKV) via Transcription.

If you change your embedding model, Momo will detect a dimension mismatch at startup.

  • Without flags: Startup will fail with an error if dimensions don’t match.
  • With --rebuild-embeddings flag: Documents are queued for reprocessing with the new model. Migration runs in the background; search continues to function with partial results.

Momo can detect when new information contradicts existing memories.

  • Heuristic: Immediate detection via negation and value changes (<1ms).
  • LLM Confirmation: Optional refinement (~200-500ms).
  • Resolution: Old memories are marked as “not latest” and linked to the new entry.
  • Required: Set ENABLE_CONTRADICTION_DETECTION=true.

Momo is designed to be functional even without external dependencies:

  • No LLM: Search and storage work, but advanced features (inference, rewrites) are disabled.
  • No Tesseract: Image processing will fail, but text documents work.
  • No Whisper: Audio/Video processing will fail, but other ingestion works.

For detailed API information, see API Reference. For MCP integration, see MCP Guide.