AI Killed the Database Zoo

You run two databases because each one is locked to a single storage format.

Elasticsearch gives you inverted indexes — great for full-text search, terrible for aggregations at scale. ClickHouse gives you columnar storage — great for analytics, but its “full-text search” is a bloom filter that can’t rank results.

So you run both. You ingest the same data twice, store it twice, pay for it twice. You manage two clusters, two on-call rotations, two sets of expertise. And every time your workload changes, you manually re-tune both.

This is the database zoo. And AI just killed it.

The root cause: one format per database

The reason you need two databases isn’t because search and analytics are fundamentally different problems. It’s because traditional databases are hardwired to a single storage format.

Elasticsearch stores everything in Lucene inverted indexes. That’s the right format when a user searches error AND service:payments — term lookups against inverted indexes are fast. But when someone asks “count errors by status code in 5-minute buckets,” an inverted index is the wrong tool. You’re doing a full scan through a structure optimized for term lookups, not columnar aggregation.

ClickHouse stores everything in columnar format. That’s the right format for the aggregation query above — scanning a column of status codes is exactly what columnar storage is built for. But when someone searches for a specific error message across petabytes of logs, ClickHouse falls back to bloom filters. No relevance scoring. No ranking. Seconds of latency instead of milliseconds.

Each database does one thing well because each is locked to one format. The zoo exists because no one format fits every workload.

The insight: let AI pick the format

The solution isn’t to build a better Elasticsearch or a better ClickHouse. It’s to build a system that picks the right storage format automatically — and tunes the query layer to match.

This is what KalDB does. When data arrives, AI profiles the workload and selects from open-source storage formats: Lucene inverted indexes for full-text search, columnar (Parquet) for analytics, wide-column for high-cardinality lookups, and more. The same data can be stored in the format that’s optimal for how it’s actually queried.

The query layer adapts too. Routing, caching, and query planning are tuned based on observed access patterns. A full-text search hits an inverted index. An aggregation hits columnar storage. A hybrid query — filter by user, then aggregate top endpoints — chains both: inverted index for the filter, columnar for the aggregation. One API, the right plan every time.

Why this matters now: agentic workloads

A year ago, you could argue that manual tuning was fine. Your dashboards ran the same queries every day. Your search patterns were predictable. A senior engineer could tune Elasticsearch shards once a quarter and move on.

That world is ending. AI agents don’t follow predictable query patterns. An agent debugging a production incident might run a full-text search, then pivot to an aggregation, then do a high-cardinality lookup — all in the same session. The next session looks completely different.

With traditional databases, every time agents change behavior, someone has to manually re-tune. Re-index. Re-shard. Re-architect. The operational burden scales with the unpredictability of the workload, and agentic workloads are maximally unpredictable.

KalDB detects workload shifts and re-tunes automatically. As query patterns evolve, storage format selection and query planning adapt in real time. No human in the loop. No migration cost. Your agents evolve freely and the database keeps up.

Not a black box

“AI picks the format” might sound like a black box. KalDB is the opposite.

Every decision is visible. For every query, you can see which storage format was chosen, which query plan was used, and why. If the AI picks columnar for a query you know should hit an inverted index, you can see that, understand the reasoning, and override it. Pin storage formats. Force query plans. Set constraints. The AI suggests, you control.

All storage uses open-source formats — Lucene, Parquet, and others. Your data is never in a proprietary encoding. It’s portable, readable, and yours.

Experiment without risk

KalDB’s built-in multi-tenancy enables something traditional databases can’t: risk-free experimentation across the entire system.

Storage format, partitioning strategy, embedding model, search plugin, ranking function — every layer is a tunable experiment. KalDB A/B tests changes by spinning up an isolated tenant against real traffic patterns. Your production workload is never affected. Only proven improvements get promoted.

Traditional systems require a team of engineers to test each change manually. KalDB lets AI experiment continuously — measuring results and improving the system on its own.

The math

Here’s what collapsing the zoo looks like:

Data duplication: eliminated. Ingest once, query any way. No more storing the same data in two systems.
Ops complexity: halved. One system to deploy, monitor, and scale. One on-call rotation. One set of expertise.
Storage cost: S3 prices. KalDB is serverless and S3-backed. All data lives durably in object storage at $0.023/GB with hot data cached locally for performance.
Manual tuning: zero. AI handles storage format selection, query optimization, and re-tuning as workloads change.
Lock-in: none. Apache 2.0 license. Open-source data formats. OpenSearch-compatible API.

At Slack and Airbnb, KalDB has run petabytes of data in production for years, delivering 90% cost reduction over the database zoo it replaced.

Try it

KalDB is open source under Apache 2.0. You can replace your Elasticsearch and ClickHouse stack in minutes — point your OpenSearch client at KalDB and go.

git clone https://github.com/kaldb/kaldb
cd kaldb
docker-compose up
# Point your OpenSearch client at localhost:8080

The database zoo had a good run. But when AI can pick the right storage format for every query automatically, running two databases is just paying double for a solved problem.

Get started on GitHub or request early access for a production trial.