mlengineeringllmopinion

Vector Databases Are Overhyped

Mar 30, 20235 min readUtso Sarkar

Every LLM startup in 2023 has a vector database in their architecture diagram. Pinecone logos on pitch decks. Weaviate mentioned in Slack channels. Qdrant discussed like it is infrastructure oxygen. I am going to say the quiet part loud: for most teams, a dedicated vector database is premature optimization dressed up as AI strategy.

This is not anti-vector-search. Embeddings are useful. Similarity retrieval works. The problem is that teams reach for a managed vector DB before they have validated that retrieval is their bottleneck, before they have exhausted simpler options, and before they understand why their RAG pipeline produces garbage answers.

The Hype Cycle in One Sentence

Investors ask “what is your RAG stack?” Founders panic and buy Pinecone. Nobody asks whether the documents are chunked correctly.

pgvector vs Pinecone: An Honest Comparison

pgvector is a PostgreSQL extension. You store embeddings alongside your relational data. One database, one backup strategy, one connection pool, transactions that actually work. Query latency is fine for datasets under a few million vectors with proper indexing. You already have Postgres ops knowledge on your team because every startup has Postgres.

Pinecone is a managed vector database optimized for similarity search at scale. Sub-50ms queries on billion-vector indexes. Serverless scaling. Zero ops if you trust their SLA. Costs scale with usage in ways that surprise founders who demo’d on the free tier.

The tradeoff is not “pgvector bad, Pinecone good.” The tradeoff is operational complexity vs. query performance at scale vs. cost predictability.

For a B2B SaaS with 10,000 documents and 500 queries per day? pgvector in your existing Postgres instance is the correct answer 90% of the time. For a consumer app doing 10 million similarity searches per hour across 100 million vectors? You need purpose-built infrastructure.

Most startups are in the first category and architect for the second because the second category sounds more like a “real AI company.”

Decision Tree: Vector DB vs Simpler Retrieval

flowchart TD
    Start([Need to retrieve context for LLM]) --> Q1{Corpus size}
    Q1 -->|< 100k chunks| Q2{Already on Postgres?}
    Q1 -->|100k - 10M chunks| Q3{Query latency SLA < 100ms?}
    Q1 -->|> 10M chunks| VDB[Consider dedicated vector DB]

    Q2 -->|Yes| PG[pgvector extension]
    Q2 -->|No| Q4{Need metadata joins?}
    Q4 -->|Yes| PG
    Q4 -->|No| BM25[Try BM25 / keyword search first]

    Q3 -->|No| PG
    Q3 -->|Yes| Q5{Team has vector DB ops experience?}
    Q5 -->|No| PG
    Q5 -->|Yes| VDB

    BM25 --> Eval{Retrieval quality good enough?}
    PG --> Eval
    VDB --> Eval

    Eval -->|No| Fix[Fix chunking and embedding model first]
    Eval -->|Yes| Ship[Ship it]
    Fix --> Start

Notice where the decision tree sends you most often: fix your fundamentals before buying infrastructure.

Why RAG Fails (It Is Usually Not the Database)

I have debugged RAG pipelines that used Pinecone, Weaviate, and pgvector. The vector database was never the problem. The problems were always:

Chunking strategy. Fixed 512-token chunks split mid-sentence, mid-table, mid-code-block. The retriever returns fragments that no LLM can synthesize into a coherent answer. Semantic chunking helps. Structure-aware chunking helps more.

Embedding model mismatch. You embed with text-embedding-ada-002 but your domain is legal contracts in Hindi. The embedding space does not capture the semantics you care about. Fine-tuned or domain-specific embedders matter more than which vector DB you use.

No reranking. Top-k cosine similarity returns plausible-looking garbage. A cross-encoder reranker on the top 20 candidates before passing to the LLM improves answer quality more than switching from pgvector to Pinecone.

Stale index. Your product docs update weekly. Your index updates never. Users ask about features that launched last month and get answers from six-month-old documentation. The vector DB works perfectly. The pipeline is broken.

No evaluation harness. Teams ship RAG without measuring retrieval precision and answer faithfulness. They discover problems in production when customers complain. Build the eval set before you build the infra.

When a Dedicated Vector DB Actually Makes Sense

I am not saying never use Pinecone. Use it when:

You have validated that retrieval quality is good with simpler tools and latency or scale is the bottleneck
Your query volume exceeds what Postgres can serve without dedicated tuning
You need hybrid search (dense + sparse) with sophisticated filtering at a scale Postgres extensions struggle with
Your team lacks Postgres expertise but has budget for managed services

These are engineering constraints, not pitch deck constraints.

The Cost Surprise

Pinecone’s pricing model punishes the curious founder. You prototype on the free tier, demo to investors, get traction, and suddenly your vector DB bill exceeds your LLM API bill. pgvector costs you whatever you already pay for Postgres compute. For early-stage startups, that difference is runway.

Run the math before you commit. Include embedding API costs, re-indexing costs when you change chunking strategy, and the engineering time to migrate when you outgrow your initial choice.

Start with BM25 or keyword search as a baseline. Measure answer quality.
Add pgvector if you are on Postgres and semantic search improves metrics.
Invest in chunking, reranking, and evaluation before switching vector DBs.
Move to a dedicated vector DB only when profiling shows Postgres is the bottleneck.
Never let “our RAG stack” become a selling point. Customers buy answers, not architecture.

Closing

Vector databases are infrastructure. Infrastructure should be boring and justified by metrics. The fact that every AI conference sponsor is a vector DB company does not mean you need one on day one.

Build the simplest retrieval pipeline that works. Measure it. Then optimize. The founders who skip this sequence spend months debugging Pinecone configs when their chunking strategy was broken from the start.

The hype will pass. Good retrieval fundamentals will not.

--claps