Give your LLM a memory it never forgets
Drop-in REST API that adds persistent cross-session memory to any LLM. ~2-3ms retrieval. Zero extra infrastructure. ~2-3ms retrieval. Zero infrastructure.
Memory Pipeline
Proven results — not just promises
Benchmarked against baseline LLM with no memory layer
Benchmark results — Exp 44 (gpt-4o-mini · text-embedding-3-small)
| Mode | Judge score (0–3) | Keyword hit |
|---|---|---|
| ★ NGT Memory (emb+graph) | 2.44 / 3 | 44% |
| ★ NGT Memory (emb only) | 2.44 / 3 | 44% |
| No memory baseline | 1.22 / 3 | 27% |
💡 In a realistic A/B profile test, memory won 5 out of 6 scenarios against the same model without memory.
LLMs without memory are broken by design
Every session starts fresh. Your users have to repeat themselves. Your AI gives dangerous generic advice. NGT Memory fixes this.
Real example — Restaurant recommendation in Kyoto
“Ippudo is great for ramen lovers” — recommends meat to a vegetarian
“Shigetsu at Tenryu-ji serves shojin ryori (Buddhist vegan cuisine)” — personalized because it remembers you’re vegetarian
How NGT Memory works
A simple pipeline that injects relevant memories into every LLM prompt
Request Pipeline
Cosine Similarity
Semantically close facts retrieved via vector similarity search
Hebbian Graph
Associative links between concepts, like the human brain
Hierarchical Consolidation
Important facts promoted to long-term memory automatically
Default Docker deployment runs a single API worker to keep in-memory sessions consistent. Multi-worker mode requires sticky routing or a shared session backend.
Up and running in 5 minutes
Drop-in REST API — no new infrastructure, no vector database, no vendor lock-in
# 1. Clone the repository
git clone https://github.com/ngt-memory/ngt-memory.git
cd ngt-memory
# 2. Configure environment
cp .env.example .env
# Set OPENAI_API_KEY in .env
# 3. Start the service (single worker, sessions in-memory)
docker-compose up -d
# ✓ NGT Memory is running at http://localhost:9190
Everything you need
Production-ready memory layer with all the features your LLM app needs
Persistent Memory
Stores facts between sessions — users never repeat themselves
Ultra-fast Retrieval
~2-3ms average, graph + cosine search, no external database
Drop-in REST API
Integrates into any LLM app in under 5 minutes
Isolated Sessions
Isolated memory per user — each session is fully independent
Simple Docker Deploy
One command deployment — docker-compose up -d
Local-first
Runs entirely on your infrastructure — no cloud dependency
Hebbian Graph
Associative links between concepts, like the human brain
Built-in Analytics
Memory metrics, session stats, retrieval performance
API Key Auth
Optional endpoint protection with configurable API keys
How we compare
NGT Memory is the only solution that requires no external vector database and delivers sub-5ms retrieval
| Feature | ★ BestNGT Memory | Mem0 | Zep | LangChain Memory |
|---|---|---|---|---|
| Self-hosted | ||||
| No vector DB required | ||||
| Hebbian graph | ||||
| Retrieval latency | ~2-3ms | ~50ms | ~100ms | ~30ms |
| Open source | ||||
| REST API |
Simple, transparent pricing
Start free. Scale as you grow. No hidden fees.
- Requests/day:100 req/day
- Sessions:1 session
- Model:gpt-4.1-nano
- Support:GitHub Issues
- Analytics:
- Requests/day:10,000 req/day
- Sessions:100 sessions
- Model:gpt-4.1-nano
- Support:Email support
- Analytics:
- Requests/day:Unlimited
- Sessions:Unlimited
- Model:gpt-4.1-nano
- Support:Priority + SLA
- Analytics:
All plans include: persistent memory, REST API, Docker deployment, Apache 2.0 license
💳 Payment via YooKassa — cards, SBP, YooMoney
💡 Self-hosted? It's free forever — just clone the repo.
Built for real-world AI applications
From healthcare to consumer apps — NGT Memory makes every LLM application smarter with context
Medical AI Assistant
Remembers allergies, medications, and prior reactions across sessions. Never gives advice that conflicts with known conditions.
💡 Patient mentioned penicillin allergy 3 sessions ago → avoided in all subsequent recommendations
Personal AI Companion
Keeps track of preferences, travel constraints, and personal plans. Grows smarter with every conversation.
💡 Knows you're vegetarian, live in Berlin, and training for a marathon
Customer Support Bot
Recalls prior issues, refund preferences, and user-specific constraints. No more asking customers to repeat themselves.
💡 Customer contacted support 3 times about billing → context injected automatically