NGT Memory

Give your LLM a memory it never forgets

Drop-in REST API that adds persistent cross-session memory to any LLM. ~2-3ms retrieval. Zero extra infrastructure. ~2-3ms retrieval. Zero infrastructure.

v0.23.0Apache 2.0Python 3.10+ Star on GitHub

Memory Pipeline

Usermessage
NGTMemory
LLMgpt-4.1-nano
Responseresult
Cosine + Graph retrieval~2-3ms latencyPersistent across sessions

Proven results — not just promises

Benchmarked against baseline LLM with no memory layer

5/6
Realistic scenario wins vs no-memory
~2-3ms
Avg memory retrieval latency
+100%
Quality vs no-memory baseline

Benchmark results — Exp 44 (gpt-4o-mini · text-embedding-3-small)

ModeJudge score (0–3)Keyword hit
NGT Memory (emb+graph)2.44 / 344%
NGT Memory (emb only)2.44 / 344%
No memory baseline1.22 / 327%

💡 In a realistic A/B profile test, memory won 5 out of 6 scenarios against the same model without memory.

0.917 with memory0.083 without memory

LLMs without memory are broken by design

Every session starts fresh. Your users have to repeat themselves. Your AI gives dangerous generic advice. NGT Memory fixes this.

Without memory
With NGT Memory
LLM recommends meat to a vegetarian
Remembers user dietary preferences
Asks to repeat context every session
Persists facts between sessions
Generic advice ignoring user history
Personalized responses every time
Dangerous advice in medical/finance contexts
Respects allergies, medications, restrictions

Real example — Restaurant recommendation in Kyoto

No memory

“Ippudo is great for ramen lovers” — recommends meat to a vegetarian

With NGT Memory

“Shigetsu at Tenryu-ji serves shojin ryori (Buddhist vegan cuisine)” — personalized because it remembers you’re vegetarian

How NGT Memory works

A simple pipeline that injects relevant memories into every LLM prompt

Request Pipeline

POST /chat
HTTP Request
Embed
text-embedding-3-small
~700ms
NGT Retrieve
cosine + graph
~2ms
LLM Prompt
[MEMORY CONTEXT] injected
gpt-4.1-nano
Generate response
~1.9s
Store
NGT Memory
~1ms

Cosine Similarity

Semantically close facts retrieved via vector similarity search

Hebbian Graph

Associative links between concepts, like the human brain

Hierarchical Consolidation

Important facts promoted to long-term memory automatically

ℹ️

Default Docker deployment runs a single API worker to keep in-memory sessions consistent. Multi-worker mode requires sticky routing or a shared session backend.

Up and running in 5 minutes

Drop-in REST API — no new infrastructure, no vector database, no vendor lock-in

bash
# 1. Clone the repository
git clone https://github.com/ngt-memory/ngt-memory.git
cd ngt-memory

# 2. Configure environment
cp .env.example .env
# Set OPENAI_API_KEY in .env

# 3. Start the service (single worker, sessions in-memory)
docker-compose up -d

# ✓ NGT Memory is running at http://localhost:9190
ℹ️Default Docker deployment uses 1 worker because session state is currently stored in memory.
REST API · OpenAPI spec included

Try it now no signup required

💬 Your message:

Everything you need

Production-ready memory layer with all the features your LLM app needs

Persistent Memory

Stores facts between sessions — users never repeat themselves

Ultra-fast Retrieval

~2-3ms average, graph + cosine search, no external database

Drop-in REST API

Integrates into any LLM app in under 5 minutes

Isolated Sessions

Isolated memory per user — each session is fully independent

Simple Docker Deploy

One command deployment — docker-compose up -d

Local-first

Runs entirely on your infrastructure — no cloud dependency

Hebbian Graph

Associative links between concepts, like the human brain

Built-in Analytics

Memory metrics, session stats, retrieval performance

API Key Auth

Optional endpoint protection with configurable API keys

How we compare

NGT Memory is the only solution that requires no external vector database and delivers sub-5ms retrieval

Feature★ BestNGT MemoryMem0ZepLangChain Memory
Self-hosted
No vector DB required
Hebbian graph
Retrieval latency~2-3ms~50ms~100ms~30ms
Open source
REST API

Simple, transparent pricing

Start free. Scale as you grow. No hidden fees.

Free
0 ₽/mo
  • Requests/day:100 req/day
  • Sessions:1 session
  • Model:gpt-4.1-nano
  • Support:GitHub Issues
  • Analytics:
Popular
Pro
1 990 ₽/mo
  • Requests/day:10,000 req/day
  • Sessions:100 sessions
  • Model:gpt-4.1-nano
  • Support:Email support
  • Analytics:
Enterprise
5 990 ₽/mo
  • Requests/day:Unlimited
  • Sessions:Unlimited
  • Model:gpt-4.1-nano
  • Support:Priority + SLA
  • Analytics:

All plans include: persistent memory, REST API, Docker deployment, Apache 2.0 license

💳 Payment via YooKassa — cards, SBP, YooMoney

💡 Self-hosted? It's free forever — just clone the repo.

Built for real-world AI applications

From healthcare to consumer apps — NGT Memory makes every LLM application smarter with context

Healthcare

Medical AI Assistant

Remembers allergies, medications, and prior reactions across sessions. Never gives advice that conflicts with known conditions.

💡 Patient mentioned penicillin allergy 3 sessions ago → avoided in all subsequent recommendations

Consumer

Personal AI Companion

Keeps track of preferences, travel constraints, and personal plans. Grows smarter with every conversation.

💡 Knows you're vegetarian, live in Berlin, and training for a marathon

Enterprise

Customer Support Bot

Recalls prior issues, refund preferences, and user-specific constraints. No more asking customers to repeat themselves.

💡 Customer contacted support 3 times about billing → context injected automatically

Ready to give your LLM a memory?

Join developers building smarter AI applications with persistent memory. Open source. Self-hosted. Production-ready.

Apache 2.0
Free forever
Self-hosted
Your data, your control
~2-3ms retrieval
Production-ready speed

We use cookies to ensure the website works properly and to improve your experience.

Privacy Policy