Updated May 2026Comparison26 min read

Best Vector Database for AI 2026: Pinecone, Qdrant, Chroma

The best vector database for AI in 2026 depends on whether you want to prototype, scale, or never leave Postgres. Chroma is the Python-first prototype tool. Pinecone is hosted SaaS at scale. Qdrant and Weaviate are Apache-2.0 servers you self-host or rent in the cloud. pgvector isn’t a vector database at all — it’s a Postgres extension. We pulled latency claims, index types, and pricing from each project’s own docs, and tell you exactly which one to start with based on workload shape.

Editorial illustration: five luminous teal vector-database glyphs in a horizontal row — a Chroma prism, a Pinecone pinecone, a Qdrant compass-needle, a Weaviate woven mesh, a pgvector Postgres-elephant silhouette — connected by softly glowing similarity-line arrows on a midnight navy backdrop.
On this page · 16 sections
  1. TL;DR + decision tree
  2. What a vector DB MCP does
  3. Architectural decision tree
  4. Side-by-side matrix
  5. Chroma — install + recipe
  6. Pinecone — install + recipe
  7. Qdrant — install + recipe
  8. Weaviate — install + recipe
  9. pgvector — install + recipe
  10. Indexing & query performance
  11. Pricing shape
  12. Common pitfalls
  13. Migration paths
  14. Community signal
  15. FAQ
  16. Sources

TL;DR + decision tree

  • Want the simplest local prototype with no separate database process? Chroma in embedded mode — one pip install chromadb, one PersistentClient(path="./db"), done. Apache-2.0, swap in a server when you outgrow it.
  • Want zero operational overhead with a real free tier? Pinecone. Serverless, hosted, 2GB free forever, the SDKs handle batching and retries. Vendor lock-in is the only cost, and the lock-in is shallower than people remember because the data shape is portable.
  • Want the best balance of performance, filtering, and OSS licensing? Qdrant. Apache-2.0 Rust server with a filter API so good it’s the reason most teams pick it, plus a generous free Cloud tier and self-host that runs in one binary.
  • Want batteries-included vectorization and hybrid search with multi-tenancy as a first-class concept? Weaviate. BSD-3 Go server with vectorizer modules that auto-embed on insert and a hybrid query that fuses BM25 with vector similarity in one call.
  • Already running Postgres and want to add vectors without adding a second piece of infrastructure? pgvector. Six distance operators, HNSW or IVFFlat indexes, your existing backups and replication. The cap is 2,000 dimensions on standard-precision indexed vectors.

These five are not substitutes — they sit at different points on a spectrum from “prototype on a laptop” to “production multi-tenant SaaS.” The matrix below makes the trade-offs explicit, then the per-tool sections cover install, code recipe, and skip conditions.

What a vector DB MCP actually does

A vector database is the storage layer for embeddings. An embedding is a list of floating-point numbers (typically 384, 768, 1536, or 3072 dimensions) produced by a model like text-embedding-3-small or BGE — two pieces of text that mean similar things end up with nearby vectors in the embedding space. The job of a vector DB is to answer one query fast: given this query vector, return the k nearest stored vectors. That’s similarity search, and it’s the foundation under retrieval-augmented generation, semantic search, memory systems, recommendation, and most agent-tooling that isn’t pure SQL.

The agent workflow has two phases that map cleanly onto MCP tools:

  1. Embed and store (write path). A batch process — or a long-running agent — chunks documents, computes embeddings with a model, and writes (id, vector, metadata) tuples into the database. This is the expensive part: compute, network, and storage all scale linearly with corpus size. You typically do it once per document, then rebuild only when the embedding model changes.
  2. Query at runtime (read path). When the agent has a question, it embeds the question with the same model, then asks the vector DB for the top-k nearest stored vectors. The DB returns ids + metadata + (optionally) the raw vector. The agent feeds the retrieved text into the LLM as context. This needs to be fast — typically sub-100ms p99 for a good RAG experience.

A vector DB MCP server wraps these two phases as MCP tools an agent can call. The good ones expose four primitives: create_collection, upsert (write vectors with payload), query (top-k similarity search with filters), and delete. The bad ones expose too many tools — one per knob — which blows the model’s context budget on tool descriptions before the first query even runs. We cover the description-bloat trap in the MCP context bloat fix deep-dive.

A pattern worth naming: most teams write directly to the DB from their application code, then expose only the query path through MCP. That keeps the agent read-only against the corpus and removes a whole class of “the model accidentally deleted my embeddings” bugs. The five MCP servers below all default to giving the agent both paths; restricting to read-only is a config choice on your side, not a per-server feature.

Architectural decision tree

Before the matrix, four binary choices that funnel you toward one or two candidates. Walk these in order.

1. Self-host or SaaS? Pinecone is SaaS only — there is no self-hosted Pinecone, and there will not be one. If your data is bound by a contract or regulation that requires it to stay in your VPC or your on-prem cluster, Pinecone is disqualified. Everyone else in this comparison supports self-host: Chroma as embedded or server, Qdrant as a single binary or Docker, Weaviate as Docker or Kubernetes, pgvector wherever your Postgres runs. Qdrant Cloud and Weaviate Cloud and Chroma Cloud exist for teams that want managed without losing the self-host option later. This first question knocks out roughly half the candidates in most enterprise conversations.

2. Embedded or server? An embedded vector DB runs in the same process as your application — no separate service, no network hop. Chroma inPersistentClient mode is the canonical example; SQLite-style file storage, zero ops. pgvector running on your existing app database is morally embedded (same process boundary as your other DB queries). Server-mode vector DBs (Pinecone, Qdrant, Weaviate, Chroma in server mode) are network services with their own auth and scaling. Embedded is the right answer for prototypes, single-machine agents, and desktop apps; server is the right answer for anything with multiple readers or that needs to survive a process restart. Switching from embedded to server is usually a one-line change in client construction.

3. Vector-only, or hybrid (vector + scalar filters + keyword)? A pure vector query asks “return the k nearest neighbours.” A real production query usually asks “return the k nearest neighbours where the document is from this tenant and the published_date is after 2025-01-01 and the language is ‘en’.” That requires either metadata filtering (all five support this to varying depths) or full hybrid search with BM25 keyword scoring fused into the rank (Weaviate, Qdrant, Pinecone). Chroma’s filter language is functional but shallow. pgvector’s hybrid is whatever you can write in SQL — which is the most powerful and the most manual. If your queries are always vector-only, pick on other axes. If they aren’t, this is where the comparison gets real.

4. Which index type — HNSW or IVFFlat? HNSW (Hierarchical Navigable Small World) is the default in Chroma, Qdrant, Weaviate, and Pinecone’s serverless tier. It gives sub-millisecond query latency at the cost of build time and memory. IVFFlat is the alternative most people first meet through pgvector — fast to build, memory-frugal, recall depends on probe count. The choice matters most for pgvector (you pick at index creation) and Qdrant (where you can configure it per collection); the others default to HNSW and that’s fine for the vast majority of workloads. We cover the trade-offs in more detail in the indexing section below.

Side-by-side matrix

Every cell sourced from the project’s own README/docs/pricing page as of 2026-05-11. Volatile pricing rows verified against pinecone.io/pricing, qdrant.tech/pricing, and weaviate.io/pricing the same week. MCP support rows verified against catalog entries in this directory.

DimensionChromaPineconeQdrantWeaviatepgvector
ShapeLibrary (embedded) + serverHosted SaaS onlyServer (Rust binary)Server (Go binary)Postgres extension
LicenseApache 2.0Proprietary SaaSApache 2.0BSD-3-ClausePostgreSQL Licence
Language under the hoodPython + RustClosedRustGoC (in Postgres)
Self-hostYesNoYes (single binary)Yes (Docker/k8s)Yes (wherever Postgres runs)
Managed offeringChroma Cloud (serverless)Pinecone (only mode)Qdrant CloudWeaviate CloudSupabase, Neon, Crunchy, RDS
Free tier (managed)Chroma Cloud free tierStarter: 2GB / 1M reads/mo0.5 vCPU / 1GB / 4GB cluster14-day trial then PAYGFree on Supabase / Neon free plans
Default indexHNSWProprietary (managed)HNSW (configurable)HNSWChoose: HNSW or IVFFlat
Hybrid searchMetadata filter onlySparse-dense indexSparse vectors + fusionBM25 + vector hybrid (built-in)SQL JOIN with tsvector
Auto-embeddingNo (compute upstream)Pinecone Inference (hosted)No (compute upstream)Yes (text2vec-* modules)No (compute upstream)
Multi-tenancyCollection-per-tenantNamespaces per indexMulti-collection or payload filterFirst-class multi-tenantSchema-per-tenant or row filter
MCP server in catalog/servers/chroma-working-memory/servers/pinecone-vector-db/servers/qdrant/servers/weaviateUse /servers/postgres
Best forLocal prototype, ephemeral memoryZero-ops production SaaSOSS production w/ filteringHybrid search + multi-tenantVector + relational in one DB

Four takeaways. Only Pinecone is SaaS-only — that’s a 50% knock-out for compliance-bound teams. Only Weaviate ships auto-embedding modules that compute vectors on insert; the other four expect you to embed upstream. Qdrant and Weaviate are the only two with truly first-class hybrid search — Chroma and Pinecone offer narrower filter shapes, and pgvector hybrid is whatever you can express in SQL. pgvector is the only one that piggy-backs on infrastructure you already operate — for teams already running Postgres at scale, this is often the deciding factor.

Chroma — install + recipe

What it does best

Chroma wins on time-to-first-query. pip install chromadb, two lines of Python, and you have a working vector store on disk. There is no separate database process to start, no auth to wire, no schema to design — collections are created on demand, the default distance is cosine, and the persistence directory is a folder you can tar and ship. The MCP catalog entry, chroma-working-memory, wraps this surface as an agent-memory tool — perfect for a Claude Code session that needs to remember decisions across turns without standing up real infrastructure. Apache-2.0 licence, no usage limits, no vendor lock-in.

Pick this if you...

  • Are prototyping a RAG pipeline and want zero ops overhead while you experiment with chunking and embedding strategies.
  • Need an embedded vector store inside a Python application — desktop tool, CLI, single-machine agent — without standing up a network service.
  • Want ephemeral or per-session memory for an LLM agent (the chroma-working-memory MCP is built exactly for this pattern).
  • Plan to graduate to a server-mode vector DB later and want to start with the option that has the cleanest migration path.

Recipe: working-memory for a Claude Code session

Here is the minimal embed-and-query pattern in pure Python — useful for tests, CLI tools, or as a reference for what the MCP server wraps under the hood:

# pip install chromadb sentence-transformers
import chromadb
from sentence_transformers import SentenceTransformer

client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection(
    name="agent_memory",
    metadata={"hnsw:space": "cosine"},
)

embedder = SentenceTransformer("BAAI/bge-small-en-v1.5")

# Write side
notes = [
    "User prefers TypeScript over Python for frontend work.",
    "Project deploys to Vercel, not Cloudflare.",
    "Database is Postgres on Neon, not Supabase.",
]
collection.upsert(
    ids=[f"note_{i}" for i in range(len(notes))],
    documents=notes,
    embeddings=embedder.encode(notes).tolist(),
)

# Read side
query = "what stack does this project use"
hits = collection.query(
    query_embeddings=embedder.encode([query]).tolist(),
    n_results=3,
)
for doc, score in zip(hits["documents"][0], hits["distances"][0]):
    print(f"{score:.3f}  {doc}")

Wire the MCP server from the card above into your Claude Code config and the agent calls these same operations through tool calls instead of Python. The same ./chroma_db folder survives between Claude Code sessions.

Skip it if...

You need production multi-tenancy, hybrid BM25-plus-vector search, or sub-100ms p99 at hundreds of QPS — Chroma was designed for the prototype-and-single-machine end of the spectrum, and operators report mixed experiences scaling it to hundreds of millions of vectors. Move to Qdrant or Pinecone when you outgrow it; the migration is mostly plumbing.

Pinecone — install + recipe

What it does best

Pinecone is the zero-operations choice. The serverless tier auto-scales reads and writes, the SDK handles retries and batching, and the only thing you operate is an index name and a region. The free Starter plan gives 2GB of storage and 1M monthly read units forever — enough to host a real RAG prototype indefinitely. Pinecone Inference (the hosted embedding service) removes the second moving part for teams that don’t already run their own embedding pipeline. The MCP catalog entry, pinecone-vector-db, wraps the query and upsert paths for agent use. The trade-off, of course, is vendor lock-in and SaaS pricing — and there is no self-host option, today or ever.

Pick this if you...

  • Want a vector database that someone else operates, monitors, scales, and patches, with a real $0 free tier you can ship to.
  • Have bursty traffic — pay-per-read-unit beats over-provisioning a Qdrant VM when QPS varies by 20× across the day.
  • Want one vendor for embeddings + storage + retrieval — Pinecone Inference brings the embedding step in-house.
  • Are okay budgeting against Pinecone’s usage meters and confident your traffic pattern fits the serverless cost model.

Recipe: upsert and query a Pinecone serverless index

# pip install pinecone openai
from pinecone import Pinecone, ServerlessSpec
from openai import OpenAI

pc = Pinecone(api_key="...")
oai = OpenAI(api_key="...")

# One-time: create the index
if "docs" not in [i.name for i in pc.list_indexes()]:
    pc.create_index(
        name="docs",
        dimension=1536,        # text-embedding-3-small
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
    )

index = pc.Index("docs")

def embed(texts):
    out = oai.embeddings.create(model="text-embedding-3-small", input=texts)
    return [d.embedding for d in out.data]

# Write side
docs = [
    {"id": "d1", "text": "Pinecone Starter is free up to 2GB."},
    {"id": "d2", "text": "Qdrant Cloud free tier is 0.5 vCPU 1GB."},
]
vectors = [
    {"id": d["id"], "values": v, "metadata": {"text": d["text"]}}
    for d, v in zip(docs, embed([d["text"] for d in docs]))
]
index.upsert(vectors=vectors)

# Read side
q_vec = embed(["what's free with pinecone"])[0]
hits = index.query(vector=q_vec, top_k=3, include_metadata=True)
for m in hits.matches:
    print(f"{m.score:.3f}  {m.metadata['text']}")

Skip it if...

Compliance bans your data from leaving your VPC, or you need self-host for cost reasons at the hundreds-of-millions-of-vectors tier. Pinecone is SaaS-only — no on-prem option exists. Qdrant self-hosted is the closest functional substitute when you need to leave.

Qdrant — install + recipe

What it does best

Qdrant’s filter API is the single most-cited reason teams pick it. The query language treats metadata filters as first-class — you can express must/should/must_not conditions over scalar fields, ranges, geo points, and even nested payloads, then fuse those with the vector similarity in a single call. Combined with the Apache-2.0 licence, a Rust binary that runs in a single container, and a Cloud free tier (0.5 vCPU / 1GB RAM / 4GB disk, forever), Qdrant covers the broadest sweet spot of any vector DB in this comparison. The official MCP server at /servers/qdrant exposes upsert and similarity-search as agent tools with payload filtering intact.

Pick this if you...

  • Need rich metadata filtering — multi-tenant isolation, time-range queries, hierarchical categories — alongside vector similarity.
  • Want self-host that runs as one binary or one container, with the option to graduate to managed Cloud without re-architecting.
  • Prefer Apache-2.0 source you can audit, fork, or embed in a commercial product without legal review.
  • Care about sparse-vector hybrid search and don’t want to bolt on a separate keyword index.

Recipe: upsert with payload, then filter-and-similarity query

# pip install qdrant-client sentence-transformers
from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance, VectorParams, PointStruct, Filter, FieldCondition, MatchValue,
)
from sentence_transformers import SentenceTransformer

client = QdrantClient(url="http://localhost:6333")
embedder = SentenceTransformer("BAAI/bge-small-en-v1.5")

client.recreate_collection(
    collection_name="papers",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
)

docs = [
    ("p1", "PagedAttention enables KV-cache sharing across requests.", "vllm", 2023),
    ("p2", "HNSW graphs give logarithmic search at high recall.",       "hnsw", 2018),
    ("p3", "Hybrid search fuses BM25 and dense vectors.",                "hybrid", 2024),
]
client.upsert(
    collection_name="papers",
    points=[
        PointStruct(
            id=i,
            vector=embedder.encode(text).tolist(),
            payload={"topic": topic, "year": year, "text": text},
        )
        for i, (pid, text, topic, year) in enumerate(docs)
    ],
)

# Filter + similarity in one call
q = embedder.encode("how does the KV cache work").tolist()
hits = client.search(
    collection_name="papers",
    query_vector=q,
    query_filter=Filter(
        must=[FieldCondition(key="year", match=MatchValue(value=2023))]
    ),
    limit=3,
)
for h in hits:
    print(f"{h.score:.3f}  {h.payload['text']}")

Skip it if...

You want the most batteries-included path — Qdrant asks you to bring your own embedding pipeline (no auto-embed on insert), so if you’d rather the database call OpenAI or Cohere on your behalf, Weaviate or Pinecone Inference will feel friendlier. Skip it too if you’d rather your vectors live inside your existing Postgres than a second piece of infrastructure — pgvector wins on operational surface area in that case.

Weaviate — install + recipe

What it does best

Weaviate is the batteries-included end of the spectrum. The vectorizer modules (text2vec-openai, text2vec-cohere, text2vec-huggingface, text2vec-jinaai) auto-embed text on insert and on query — you write text, you query text, Weaviate handles the vector math. Hybrid search (BM25 keyword + vector similarity fused with an alpha parameter) is a single method call, not two indexes glued together. Multi-tenancy is first-class: one collection can hold thousands of isolated tenants without separate schemas. BSD-3 licence keeps it commercial-friendly. The official MCP server at /servers/weaviate exposes Weaviate as either a knowledge base or a semantic memory store for an AI agent.

Pick this if you...

  • Want hybrid keyword-plus-vector search out of the box, with a single tunable alpha parameter controlling the fusion.
  • Don’t want to run an embeddings pipeline yourself — Weaviate’s modules call the embedding provider on your behalf at insert and query time.
  • Are building multi-tenant SaaS and want tenants as a first-class concept rather than a row-filter convention.
  • Care about generative search — Weaviate can chain retrieval into an LLM call inside the database, useful for some agent patterns.

Recipe: hybrid search across a tenant

# pip install weaviate-client
import weaviate
from weaviate.classes.config import Configure, Property, DataType
from weaviate.classes.query import HybridFusion

client = weaviate.connect_to_local()  # or connect_to_weaviate_cloud(...)

# Schema with auto-embed via text2vec-openai
client.collections.create(
    name="Article",
    vectorizer_config=Configure.Vectorizer.text2vec_openai(),
    multi_tenancy_config=Configure.multi_tenancy(enabled=True),
    properties=[
        Property(name="title", data_type=DataType.TEXT),
        Property(name="body", data_type=DataType.TEXT),
    ],
)

articles = client.collections.get("Article")
articles.tenants.create(["tenant-A", "tenant-B"])

t_a = articles.with_tenant("tenant-A")
t_a.data.insert({"title": "HNSW deep dive", "body": "..."})
t_a.data.insert({"title": "IVFFlat tradeoffs", "body": "..."})

# Hybrid query: 50/50 fusion of BM25 and vector
hits = t_a.query.hybrid(
    query="how does HNSW differ from IVFFlat",
    alpha=0.5,                       # 0 = keyword only, 1 = vector only
    fusion_type=HybridFusion.RELATIVE_SCORE,
    limit=3,
)
for h in hits.objects:
    print(h.properties["title"], h.metadata.score)

Skip it if...

You want the embedding step under your direct control — Weaviate’s vectorizer modules pin you to a module config at collection-creation time and changing embedding providers later is friction. Skip it too if you want the smallest possible operational surface; Qdrant is a tighter binary and pgvector is no extra binary at all.

pgvector — install + recipe

Postgres extension

pgvector

PostgreSQL license · adds vector type + indexes to Postgres

View on GitHub →

What it does best

pgvector turns your existing Postgres into a vector store. Once installed, every database in the cluster gets a vector data type, six distance operators (cosine <=>, L2 <->, negative inner product <#>, L1 <+>, Hamming <~> for binary vectors, and Jaccard <%> for binary vectors), and the choice of HNSW or IVFFlat indexes. The pitch is operational: you don’t add a second piece of infrastructure, you don’t add a second backup story, you don’t add a second authentication boundary. Standard vectors support up to 16,000 dimensions in storage but indexed vectors cap at 2,000 dimensions for the standard precision (half-precision and binary quantization push those limits further). There’s no pgvector-specific MCP — anything that speaks SQL works, including the /servers/postgres MCP.

Pick this if you...

  • Already run Postgres for the rest of your application and want vectors to inherit your backups, replication, and auth model.
  • Want to JOIN vector similarity with relational data in the same query — “k nearest documents where the user is in this org” is a single SQL statement.
  • Operate at a scale (under a few hundred million vectors, under 2,000 dimensions) where a tuned Postgres HNSW index keeps up comfortably.
  • Prefer the PostgreSQL Licence (BSD-ish, permissive) and the operational maturity of a 30-year-old database for your vector workload.

Recipe: pgvector with HNSW + a SQL JOIN to relational rows

-- One-time: install the extension and create the table
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
    id           BIGSERIAL PRIMARY KEY,
    org_id       BIGINT NOT NULL,
    title        TEXT NOT NULL,
    body         TEXT NOT NULL,
    embedding    VECTOR(1536),     -- text-embedding-3-small
    created_at   TIMESTAMPTZ DEFAULT now()
);

-- HNSW index (better query latency, slower build)
CREATE INDEX ON documents
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

-- Write side (from Python, with embeddings computed upstream)
-- INSERT INTO documents (org_id, title, body, embedding)
--    VALUES ($1, $2, $3, $4);

-- Read side: top-3 nearest WHERE org matches, cosine distance
SELECT id, title, 1 - (embedding <=> $1) AS similarity
  FROM documents
 WHERE org_id = $2
 ORDER BY embedding <=> $1
 LIMIT 3;

Notice that the multi-tenant isolation (WHERE org_id = ...) is plain SQL alongside the similarity ranking — no separate filter syntax to learn, no second piece of infrastructure to operate. The Python side computes embeddings the same way it does for any other vector DB; the storage is just a Postgres table.

Skip it if...

You need vectors above 2,000 dimensions with HNSW indexing (some newer multilingual models hit 4,096 dimensions), or you operate at billions of vectors where a dedicated vector DB’s scaling story beats single-Postgres scaling. Skip it too if your Postgres is already busy — sharing a database between high-QPS OLTP and vector workloads can contend on the same buffer pool, and a separate Qdrant or Pinecone removes the contention.

Indexing & query performance

Vector search at any meaningful scale is approximate. Exact nearest-neighbour search means computing the distance from your query vector to every stored vector and sorting — fine for ten thousand vectors, intractable for ten million. Approximate algorithms trade a small drop in recall (you might miss the absolute top neighbour some fraction of the time) for orders-of-magnitude speedup. Two algorithms dominate the consumer side: HNSW and IVFFlat.

HNSW (Hierarchical Navigable Small World) builds a layered graph. Each vector is a node; higher layers have fewer nodes and longer connections; lower layers have all nodes with shorter connections. Search descends the layers greedily — find the rough neighbourhood at the top, refine at the bottom. The practical knobs are M (number of connections per node, typically 16) and ef_construction (search width during build, typically 64-200). HNSW gives sub-millisecond queries at high recall, at the cost of build time (slower than IVFFlat) and memory (the graph itself takes space). It’s the default in Chroma, Qdrant, Weaviate, and Pinecone’s serverless tier for a reason: for read-heavy RAG workloads, the trade-off lands in HNSW’s favour the vast majority of the time.

IVFFlat (Inverted File Flat) partitions the vector space into clusters (Voronoi cells) using k-means at index build time, then at query time scans only the clusters nearest to the query. The knob is probes — how many clusters to inspect. More probes means higher recall but slower queries; fewer probes means faster queries but more chance of missing the right answer. IVFFlat builds much faster than HNSW (the centroids are fixed, no graph to construct) and uses less memory. It’s the pragmatic choice when your workload is write-heavy, the index needs to rebuild frequently after large inserts, or you’re memory-constrained. pgvector exposes both; Qdrant defaults to HNSW but can be configured otherwise.

What the published benchmarks actually say. Qdrant’s public benchmarks page (qdrant.tech/benchmarks) is the only first-party comparison published by any of these vendors. The headline framing — “Qdrant achieves highest RPS and lowest latencies in almost all the scenarios” — is genuine for the datasets and configurations they tested, but the exact numbers depend heavily on dataset (1M GIST-1M is very different from 100M deep-image), hardware, and the specific build of the competitor. The page also notes Elasticsearch “can be 10x slower when storing 10M+ vectors,” Milvus is fastest at indexing time but trails on query latency at higher dimensions, and Weaviate “improved the least since our last run.” Read it as informed directional signal from one of the players, not an impartial leaderboard. The right benchmark for your team is the one you run on your own corpus, on your own hardware, with your own embedding model — anything else is marketing dressed up as data.

Recall vs latency is a real trade-off. Bumping HNSW’s ef_search from 64 to 256 typically doubles latency and adds a few points of recall. Whether those few points matter depends on your task: for RAG over a small curated corpus, recall@10 of 0.95 may be plenty because the LLM tolerates noisy retrieval; for legal discovery or medical retrieval where missing a relevant document is costly, you want 0.99+. Tune ef_search per workload rather than accepting the default — this is the single biggest knob nobody documents adequately.

Quantization changes the math again. Once your corpus crosses tens of millions of vectors, storing them at full float32 precision becomes the dominant cost. Qdrant, Weaviate, and Pinecone all support scalar quantization (compress each dimension to int8 — 4x storage saving, modest recall loss) and product quantization (compress groups of dimensions jointly — 16-32x saving, more recall loss). pgvector supports half-precision and binary quantization on the column type itself. The right knob to reach for at scale is rarely “a bigger machine” — it’s “quantize the vectors and measure the recall delta.” A production system running 200M vectors at int8 with 95% recall almost always beats the same system running 50M vectors at float32 with 97% recall, because the LLM downstream can absorb the noise but the budget cannot absorb the storage cost.

Dimension matters more than people expect. Higher-dimensional vectors aren’t strictly better — they cost more to store, more to compare, and the curse of dimensionality eventually makes distances less discriminative. The current sweet spot for general-purpose English RAG sits between 384 dimensions (BGE-small, GTE-small) and 1,536 dimensions (text-embedding-3-small). Below 384, retrieval quality drops sharply; above 1,536, you’re paying more for marginal gains and bumping into pgvector’s 2,000-dimension index cap. Pick your embedding model based on retrieval quality on your evaluation set, not on dimension count.

Pricing shape

The libraries and binaries are all free. The managed offerings are not. Numbers below are pulled from each vendor’s public pricing page on 2026-05-11 — treat as a snapshot.

TierChromaPineconeQdrantWeaviatepgvector
Self-hostFree (Apache 2.0)Not availableFree (Apache 2.0)Free (BSD-3)Free (PG licence)
Free managedChroma Cloud free tierStarter: 2GB, 1M reads/mo0.5 vCPU / 1GB / 4GB14-day trial then PAYGFree on Supabase/Neon
Entry paidChroma Cloud PAYGBuilder: $20/mo flatStandard: usage-basedFlex: from $45/moSupabase Pro: $25/mo
ProductionChroma CloudStandard: from $50/mo + usageStandard: usage-basedPremium: from $400/moCrunchy / RDS / Aurora
EnterpriseChroma EnterpriseEnterprise: from $500/mo + 99.95% SLAPremium: enterprise spendPremium: 99.95% uptimeSelf-managed Postgres clusters
Pricing pivots onCloud storage + readsRead/write units + storageCompute + memory + storageCluster size + featuresPostgres host pricing

Treat dollar amounts above as approximate — pricing pages update frequently and serverless meters change independent of headline tiers. Always confirm before you commit: pinecone.io/pricing, qdrant.tech/pricing, weaviate.io/pricing. Chroma Cloud’s pricing is documented at docs.trychroma.com; pgvector inherits its hosting cost from whichever Postgres provider you use.

Common pitfalls

pgvector IVFFlat needs rebuilding after large inserts

IVFFlat picks cluster centroids at index build time. If you insert another 5M vectors after the initial build, the centroids no longer represent the data and your agent’s recall will quietly degrade — top-k queries start missing relevant neighbours and the only signal is “the agent feels worse this month.” Either rebuild the index periodically (REINDEX) or use HNSW, which doesn’t have this failure mode.

Pinecone serverless read units add up faster than expected

The Starter tier’s 1M monthly read units feel abundant until an agent loop runs queries on every turn. Pinecone’s billing math counts each query as one read unit per 100 returned matches — so a top_k=50 query at 10 turns/conversation across 1,000 conversations/day is 500k read units a month from one product feature. Cap top_k early; cache aggressively.

Chroma in embedded mode is single-process

PersistentClient works great when one process is writing. Two processes writing to the same SQLite-backed Chroma directory will corrupt state — Chroma is not designed for concurrent writers in embedded mode. The moment you need multiple processes, switch to HttpClient against Chroma server. Migration is one line of client construction.

Weaviate vectorizer lock-in by collection

The text2vec-openai module you choose at collection-creation time is sticky — switching embedding providers means creating a new collection and reindexing. This is a real cost at millions of vectors. Default to letting Weaviate store pre-computed vectors (skip the vectorizer) unless you’re certain the module choice will outlive your indexing strategy.

Qdrant payload indexes are not automatic

Qdrant’s filter API is fast — when the relevant payload fields have payload indexes. If they don’t, filtering becomes a scan and latency degrades sharply at scale. Create payload indexes for every field you filter on with client.create_payload_index(); the docs make this look optional and it isn’t.

pgvector dimension cap on indexed columns

Standard-precision indexed vectors cap at 2,000 dimensions with HNSW or IVFFlat. Cohere embed-english-v3.0 at 1024 dimensions is fine; OpenAI text-embedding-3-small at 1536 is fine; OpenAI text-embedding-3-large at 3072 is not, and you’ll need half-precision or binary quantization (which pgvector supports) to index it. Plan around this before you commit to a model.

Re-embedding when the model changes is a multi-day job

The single biggest hidden cost across all five. When you upgrade from text-embedding-3-small to text-embedding-3-large, every vector in the database is obsolete — wrong dimension, wrong embedding space — and must be regenerated. Budget for the embedding cost, the storage rewrite, and the index rebuild. This isn’t a vector-DB problem so much as an embedding-pipeline problem, but it’s where most teams discover their data lineage isn’t as good as they thought.

Migration paths

One advantage of the vector DB space: data shapes are similar enough that switching is mostly plumbing. A vector plus a payload plus an id maps cleanly between every system in this comparison; the migration cost is code, not modelling. Three patterns recur.

Chroma → Qdrant or Pinecone. Almost always because the team has graduated past prototype into something with multiple readers, multi-tenant isolation requirements, or scale expectations. The migration is an export loop from Chroma’s persistent client (collection.get(include=["embeddings", "metadatas", "documents"])) into a bulk upsert against the target. Check the distance metric matches (Chroma defaults to cosine; Qdrant’s default is also cosine if you set Distance.COSINE). Allow a couple of hours of upsert time per million vectors; rebuild indexes after the bulk load.

Pinecone → self-hosted (usually Qdrant). Almost always because the bill grew faster than expected or compliance changed. The escape path is straightforward — Pinecone’s fetch and list APIs let you read every vector and namespace, and Qdrant’supsert takes the same data shape. The hidden friction is operational: you now own a database, with backups, monitoring, and patching. Most teams that make this switch also acquire a platform engineer in the same quarter. Run both side-by-side for a few weeks before flipping traffic.

pgvector → dedicated vector DB. The trigger is usually one of three: vectors above 2,000 dimensions outgrowing indexed-column limits, contention between OLTP and vector workloads in the shared buffer pool, or operational pressure to separate concerns. The migration is a sequential read from Postgres (SELECT id, embedding, metadata FROM documents) into a bulk upsert against the target. Dual-write during the cutover so reads can fail back, then verify recall on a holdout set before flipping the read path. The relational data stays in Postgres; only the vector column moves.

Notice what isn’t a common migration: Weaviate → anywhere. Weaviate’s schema-and-vectorizer model captures more state than the other systems, so migrations out of Weaviate usually require recomputing embeddings against a different model rather than copying vectors verbatim. If you start on Weaviate, plan to stay on Weaviate; the lock-in is real but well-priced for what you get.

The most underrated migration is the simplest: staying put and tuning. Teams routinely blame the database when the real culprit is an unindexed payload field, an undertuned ef_search, or an embedding model swapped without rebuilding the index. Before migrating, run an evaluation set against your current store with the knobs explicit (M, ef_construction, ef_search for HNSW; lists and probes for IVFFlat) and measure recall and latency. About half the time, the migration cancels itself because the existing system was fine once configured properly.

Community signal

Three voices that capture why developers reach for different parts of this matrix. Verbatim from the cited sources.

Up to 2 GB storage and Up to 1M/mo read units. Access to dense, sparse, and full-text indexes (up to 5).

Pinecone · Blog

Pinecone's own Starter-tier description. The free tier is the most generous in the category for hosted-only vector DBs — large enough to run a real RAG prototype indefinitely.

Source
Qdrant achieves highest RPS and lowest latencies in almost all the scenarios.

Qdrant · Blog

From Qdrant's first-party benchmarks page. The framing is informed but partisan — the only public side-by-side any of these vendors publishes, useful as directional signal rather than an impartial leaderboard.

Source
Supports HNSW and IVFFlat indexes. Standard vectors can have up to 16,000 dimensions; for indexed vectors specifically, standard vectors are limited to 2,000 dimensions with HNSW/IVFFlat indexes.

pgvector · Blog

From pgvector's README. The 2,000-dimension cap on indexed standard-precision vectors is the single most consequential pgvector fact for anyone choosing an embedding model — picks above 2,000 dimensions need half-precision or binary quantization to index.

Source

Frequently asked questions

What's the difference between Chroma, Pinecone, Qdrant, Weaviate, and pgvector at a glance?

Five different shapes of vector store. Chroma is the Apache-2.0 Python-first option built for the simplest possible local prototype — embedded mode runs in your process, server mode is one Docker container. Pinecone is hosted SaaS only, with the cleanest production operations story and a real $0 Starter tier (2GB storage, 1M reads/mo). Qdrant is Apache-2.0 Rust with the best filtering API and a free 0.5 vCPU / 1GB Cloud cluster. Weaviate is BSD-3 Go with built-in vectorizer modules so insert-with-auto-embed actually works without a separate embeddings pipeline. pgvector is a PostgreSQL extension that turns your existing Postgres into a vector store — vector and relational in one box, with HNSW or IVFFlat indexes and six distance operators including cosine, L2, inner product, L1, Hamming, and Jaccard.

Which vector database has the best MCP server support?

Four of the five have a first-party or well-maintained MCP server in this directory: Chroma via the chroma-working-memory MCP, Pinecone via pinecone-vector-db, Qdrant via the official /servers/qdrant build, and Weaviate via the official /servers/weaviate package. pgvector is the outlier — there is no dedicated pgvector MCP because anything that speaks Postgres works (the /servers/postgres MCP gives an agent SQL access including pgvector columns, but it doesn't ship vector-specific tool descriptions). If you need an agent to do similarity search end-to-end, Qdrant and Weaviate are the smoothest paths because their MCP servers expose the right level of abstraction; Pinecone is good when you already have a Pinecone index in production; Chroma is good for ephemeral memory in a Claude-Code-like loop.

Is pgvector fast enough for production?

For the workloads most teams actually run, yes. pgvector with an HNSW index scales to tens of millions of vectors on a properly tuned Postgres instance. The two things that make people doubt it are dimension limits (indexed vectors max at 2,000 dimensions for standard float vectors with HNSW or IVFFlat — though half-precision and binary quantization extend that range) and the fact that you share the database with whatever other workloads run on the same Postgres. If your dimension is under 2,000 and your other Postgres traffic is moderate, pgvector wins on operational simplicity. Above 2,000 dimensions, or at the hundreds-of-millions-of-vectors scale where index build times matter, a dedicated vector DB like Qdrant or Pinecone becomes worth the second piece of infrastructure.

HNSW vs IVFFlat — which index should I pick?

HNSW (Hierarchical Navigable Small World) gives better query latency and recall on most workloads but is slower to build and uses more memory. IVFFlat (Inverted File Flat) builds faster, uses less memory, but its recall depends heavily on the number of probed lists per query — too few and you miss neighbours, too many and you've thrown away the index's speedup. The practical rule: HNSW for read-heavy workloads where the index is built once and queried millions of times (most RAG); IVFFlat for write-heavy workloads where indexes need to rebuild after large inserts or for memory-constrained boxes. Qdrant defaults to HNSW. Weaviate defaults to HNSW. Pinecone abstracts this away. Chroma uses HNSW under the hood. pgvector lets you pick at index creation time.

Pinecone vs Qdrant — which is cheaper at production scale?

Qdrant self-hosted is cheaper than Pinecone at any scale where you have someone willing to operate a database. The cost difference is your engineering time vs Pinecone's serverless usage fees ($0.33/GB/month storage, $16-18 per million read units, $4-4.50 per million write units on Standard). At 10M vectors with bursty traffic, Pinecone's serverless plan often ends up cheaper than running Qdrant on an over-provisioned VM because you only pay for actual reads. At 100M vectors with steady QPS, Qdrant on a single tuned host is usually cheaper because Pinecone's per-read-unit math starts to compound. Qdrant Cloud's free tier (0.5 vCPU / 1GB / 4GB disk) covers prototype budgets at zero cost.

Can I do hybrid search (vector + keyword) on all five?

Yes, but the depth varies. Weaviate has the strongest hybrid implementation — keyword (BM25) and vector are fused with a single hybrid query, no separate index plumbing. Qdrant supports hybrid via its sparse-vector index (BM25-style sparse vectors stored alongside dense), with explicit fusion logic in the query API. Pinecone supports hybrid through its sparse-dense index types. Chroma supports filtering on metadata but its keyword search is shallow. pgvector pairs naturally with Postgres full-text search (tsvector, GIN indexes) — you write a SQL JOIN across the two, which is the most powerful and the most manual. Pick Weaviate or Qdrant if hybrid is a first-class requirement and you don't want to write the fusion yourself.

Should I embed in my application or let the database do it?

Default to embedding in your application. The reason is data lineage: when the embedding model changes (and it will — OpenAI deprecates models, you upgrade to a better one), you control the regeneration. The exception is Weaviate, which makes auto-embedding-on-insert a first-class workflow via its vectorizer modules — if you're going to use one provider's embedding model and never change it, Weaviate's text2vec-openai (or text2vec-cohere, text2vec-huggingface, text2vec-jinaai) module saves you a service. For Chroma, Pinecone, Qdrant, and pgvector, you compute the embedding upstream and pass the vector at insert time. Most production teams converge on the application-embeds-and-database-stores pattern even when the DB offers built-in embedding, because regenerations are then a job you control.

What's the simplest way to migrate from Chroma to Qdrant?

Export from Chroma's persistent client as numpy arrays plus metadata, then bulk-upsert into a fresh Qdrant collection with a matching distance metric and vector size. The data shape is essentially identical (id, vector, metadata payload), so the migration is mostly plumbing rather than transformation. The pitfall: Chroma collections default to cosine distance; verify Qdrant's collection is configured with Distance.COSINE before you upsert. If you're using Chroma's metadata filtering, port those filter dicts to Qdrant's filter API — Chroma's $eq/$in style maps to Qdrant's must/should/must_not structure with a small adapter. Plan a couple of hours per million vectors for the upsert, and re-index after the bulk load lands.

Does Pinecone have a free tier in 2026?

Yes. Pinecone's Starter plan is completely free and includes up to 2GB of storage, up to 1M monthly read units, up to 2M monthly write units, and up to 5 indexes (dense, sparse, or full-text). It's enough to host a real RAG prototype indefinitely as long as the corpus is small and traffic is moderate. Builder at $20/mo flat is the next step up; Standard starts at $50/mo minimum with usage-based pricing; Enterprise starts at $500/mo with the 99.95% uptime SLA, private networking, and customer-managed encryption keys. Always check pinecone.io/pricing before committing — serverless meter rates change.

Which vector DB integrates best with LangChain and LlamaIndex?

All five have first-party integrations in both frameworks. The differences are subtle. Chroma has the shortest 'hello world' in the LangChain docs (Chroma.from_documents in three lines). Pinecone's integration is the most polished operationally — async client, batched upserts, proper error handling. Qdrant's integration exposes the most native features (payload filtering, sparse vectors, named vectors per collection). Weaviate's integration leans hardest on Weaviate-specific concepts (collections, named vectorizers). pgvector's LangChain integration is named PGVector and is essentially a thin wrapper over SQL — the operational story is whatever your Postgres operational story already is. For a clean LangChain experience pick Chroma or Pinecone; for a powerful one pick Qdrant; for a relational-meets-vector hybrid pick pgvector.

Sources

Chroma

Pinecone

Qdrant

Weaviate

pgvector

Related comparisons

Internal links

Keep reading

Comparison · in progress

Mem0 vs Letta vs Zep vs Cognee (2026)

Coming soon

Roundup · in progress

Best knowledge-base MCP servers (2026)

Coming soon

Comparison

LangGraph vs CrewAI vs Letta vs AutoGen (2026)

Read