← Back to leaderboard

Postgres extension for BM25 relevance

Show HN: Postgres extension for BM25 relevance-ranked full-text search

83 AI Score

Show_hn other Added Apr 1, 2026

Details

Sector

other

Total Funding

Last Round

About

Last summer we faced a conundrum at my company, Tiger Data, a Postgres cloud vendor whose main business is in timeseries data. We were trying to grow our business towards emerging AI-centric workloads and wanted to provide a state-of-the-art hybrid search stack in Postgres. We'd already built pgvectorscale in house with the goal of scaling semantic search beyond pgvector's main memory limitations. We just needed a scalable ranked keyword search solution too.The problem: core Postgres doesn't provide this; the leading Postgres BM25 extension, ParadeDB, is guarded behind AGPL; developing our own extension appeared daunting. We'd need a small team of sharp engineers and 6-12 months, I figured. And we'd probably still fall short of the performance of a mature system like Parade/Tantivy.Or would we? I'd be experimenting long enough with AI-boosted development at that point to realize that with the latest tools (Claude Code + Opus) and an experienced hand (I've been working in database systems internals for 25 years now), the old time estimates pretty much go out the window.I told our CTO I thought I could solo the project in one quarter. This raised some eyebrows.It did take a little more time than that (two quarters), and we got some real help from the community (amazing!) after open-sourcing the pre-release. But I'm thrilled/exhausted today to share that pg_textsearch v1.0 is freely available via open source (Postgres license), on Tiger Data cloud, and hopefully soon, a hyperscalar near you:<a href="https://github.com/timescale/pg_textsearch" rel="nofollow">https://github.com/timescale/pg_textsearch</a>In the blog post accompanying the release, I overview the architecture and present benchmark results using MS-MARCO. To my surprise, we were not only able to meet Parade/Tantivy's query performance, but exceed it substantially, measuring a 4.7x advantage on query throughput at scale:<a href="https://www.tigerdata.com/blog/pg-textsearch-bm25-full-text-search-postgres" rel="nofollow">https://www.tigerdata.com/blog/pg-textsearch-bm25-full-text-...</a>It's exciting (and, to be honest, a little unnerving) to see a field I've spent so much time toiling in change so quickly in ways that enable us to be more ambitious in our technical objectives. Technical moats are moats no longer.The benchmark scripts and methodology are available in the github repo. Happy to answer any questions in the thread.Thanks,TJ (tj@tigerdata.com)

AI Score Reasoning

This project addresses a critical gap in the 'Postgres for AI' ecosystem by providing a high-performance, permissively licensed BM25 search extension. The combination of 25 years of database expertise and AI-accelerated development has produced a product that significantly outperforms existing AGPL-restricted competitors, positioning it as a vital component for modern RAG stacks.

Investment Memo

## Executive Summary Tiger Data is positioning itself as the definitive "Postgres for AI" provider by filling critical gaps in the Postgres ecosystem, specifically high-performance hybrid search. Their new extension, `pg_textsearch`, delivers BM25 relevance ranking with 4.7x the throughput of the current leader (ParadeDB) while utilizing a more permissive license. This is a high-conviction infrastructure play targeting the "Postgres is all you need" movement for RAG (Retrieval-Augmented Generation) workloads. ## Founder / Team Assessment The technical leadership is elite. TJ (25 years in DB internals) demonstrates "10x engineer" velocity, solo-developing a high-performance extension in six months that outperforms established teams. His ability to leverage AI-augmented development (Claude Code/Opus) to collapse traditional R&D timelines is a significant signal of technical agility. However, the team appears heavily weighted toward engineering; we need to evaluate their Go-To-Market (GTM) and enterprise sales capabilities to compete with established cloud database providers. ## Market Analysis The market for AI-centric database workloads is expanding rapidly as enterprises move from vector-only search to hybrid search (vector + keyword) to improve RAG accuracy. As developers consolidate their stacks to reduce operational complexity, the Total Addressable Market (TAM) is essentially the $80B+ relational database market shifting toward AI. The timing is optimal, as the industry is currently seeking a permissively licensed, high-performance alternative to AGPL-restricted or proprietary search solutions. ## Product / Traction The product is a high-performance BM25 extension for Postgres that solves the "relevance gap" in core Postgres search. * **Differentiation:** 4.7x query throughput advantage over ParadeDB/Tantivy and a permissive Postgres license (vs. AGPL), which is a critical requirement for hyperscalers and risk-averse enterprises. * **Traction:** Strong initial community signal with ~190 HN points and immediate open-source contributions. The extension is already integrated into the Tiger Data cloud, providing an immediate path to monetization. * **Moat:** While the founder notes "technical moats are moats no longer," the integration of `pgvectorscale` and `pg_textsearch` creates a powerful, unified "AI-stack-in-a-box" that is difficult to replicate with the same performance profile. ## Competitive Landscape * **Direct Competitors:** ParadeDB (high performance but restricted by AGPL license) and core Postgres `tsvector` (widely available but lacks BM25 ranking and scale). * **Indirect Competitors:** Specialized search engines like Elasticsearch, Algolia, and Pinecone. Tiger Data’s advantage is "zero-ETL" and reduced infra complexity by keeping data within Postgres. * **Incumbent Risk:** Hyperscalers (AWS/GCP) could potentially fork or build their own BM25 implementations, though Tiger Data’s performance lead provides a significant head start. ## Investment Thesis **Bull Case:** 1. **Postgres Dominance:** Postgres is winning the AI database war; Tiger Data owns the two most critical extensions (`pgvectorscale` and `pg_textsearch`) for this era. 2. **Performance & Licensing:** The 4.7x performance lead combined with a permissive license makes this the default choice for every developer building RAG on Postgres. 3. **Elite Velocity:** The team’s ability to ship production-grade database internals at this speed suggests they can out-innovate larger, slower incumbents. **Bear Case:** 1. **Monetization Risk:** Open-source extensions are notoriously difficult to monetize if hyperscalers offer them as a managed service without contributing back. 2. **Commoditization:** As the founder admitted, AI-boosted development lowers the barrier to entry for competitors to build similar high-performance extensions. 3. **Niche Play:** If the market shifts away from hybrid search toward more advanced "long-context" LLMs that don't require RAG, the demand for specialized search extensions could soften. ## Recommended Action **Conduct Deeper Diligence.** We need to clarify the corporate relationship between Tiger Data and Timescale, validate the benchmark claims via independent testing, and assess the conversion rate of open-source users to their managed cloud offering.

Source

Show_hn — View original →