DealForge autonomously sources, scores, and writes investment memos on venture deals. Stop manually hunting.

1,180+ deals tracked  ·  22 AI investment memos  ·  Updated daily

← Back to leaderboard

Postgres extension for BM25 relevance

Show HN: Postgres extension for BM25 relevance-ranked full-text search

83 AI Score
Show_hn other Added Apr 1, 2026

Details

Sector
other
Total Funding
$0
Last Round
$0

About

Last summer we faced a conundrum at my company, Tiger Data, a Postgres cloud vendor whose main business is in timeseries data. We were trying to grow our business towards emerging AI-centric workloads and wanted to provide a state-of-the-art hybrid search stack in Postgres. We&#x27;d already built pgvectorscale in house with the goal of scaling semantic search beyond pgvector&#x27;s main memory limitations. We just needed a scalable ranked keyword search solution too.<p>The problem: core Postgres doesn&#x27;t provide this; the leading Postgres BM25 extension, ParadeDB, is guarded behind AGPL; developing our own extension appeared daunting. We&#x27;d need a small team of sharp engineers and 6-12 months, I figured. And we&#x27;d probably still fall short of the performance of a mature system like Parade&#x2F;Tantivy.<p>Or would we? I&#x27;d be experimenting long enough with AI-boosted development at that point to realize that with the latest tools (Claude Code + Opus) and an experienced hand (I&#x27;ve been working in database systems internals for 25 years now), the old time estimates pretty much go out the window.<p>I told our CTO I thought I could solo the project in one quarter. This raised some eyebrows.<p>It did take a little more time than that (two quarters), and we got some real help from the community (amazing!) after open-sourcing the pre-release. But I&#x27;m thrilled&#x2F;exhausted today to share that pg_textsearch v1.0 is freely available via open source (Postgres license), on Tiger Data cloud, and hopefully soon, a hyperscalar near you:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;timescale&#x2F;pg_textsearch" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;timescale&#x2F;pg_textsearch</a><p>In the blog post accompanying the release, I overview the architecture and present benchmark results using MS-MARCO. To my surprise, we were not only able to meet Parade&#x2F;Tantivy&#x27;s query performance, but exceed it substantially, measuring a 4.7x advantage on query throughput at scale:<p><a href="https:&#x2F;&#x2F;www.tigerdata.com&#x2F;blog&#x2F;pg-textsearch-bm25-full-text-search-postgres" rel="nofollow">https:&#x2F;&#x2F;www.tigerdata.com&#x2F;blog&#x2F;pg-textsearch-bm25-full-text-...</a><p>It&#x27;s exciting (and, to be honest, a little unnerving) to see a field I&#x27;ve spent so much time toiling in change so quickly in ways that enable us to be more ambitious in our technical objectives. Technical moats are moats no longer.<p>The benchmark scripts and methodology are available in the github repo. Happy to answer any questions in the thread.<p>Thanks,<p>TJ (tj@tigerdata.com)

AI Score Reasoning

This project addresses a critical gap in the 'Postgres for AI' ecosystem by providing a high-performance, permissively licensed BM25 search extension. The combination of 25 years of database expertise and AI-accelerated development has produced a product that significantly outperforms existing AGPL-restricted competitors, positioning it as a vital component for modern RAG stacks.

Investment Memo

## Executive Summary Tiger Data is positioning itself as the definitive "Postgres for AI" provider by filling critical gaps in the Postgres ecosystem, specifically high-performance hybrid search. Their new extension, `pg_textsearch`, delivers BM25 relevance ranking with 4.7x the throughput of the current leader (ParadeDB) while utilizing a more permissive license. This is a high-conviction infrastructure play targeting the "Postgres is all you need" movement for RAG (Retrieval-Augmented Generation) workloads. ## Founder / Team Assessment The technical leadership is elite. TJ (25 years in DB internals) demonstrates "10x engineer" velocity, solo-developing a high-performance extension in six months that outperforms established teams. His ability to leverage AI-augmented development (Claude Code/Opus) to collapse traditional R&D timelines is a significant signal of technical agility. However, the team appears heavily weighted toward engineering; we need to evaluate their Go-To-Market (GTM) and enterprise sales capabilities to compete with established cloud database providers. ## Market Analysis The market for AI-centric database workloads is expanding rapidly as enterprises move from vector-only search to hybrid search (vector + keyword) to improve RAG accuracy. As developers consolidate their stacks to reduce operational complexity, the Total Addressable Market (TAM) is essentially the $80B+ relational database market shifting toward AI. The timing is optimal, as the industry is currently seeking a permissively licensed, high-performance alternative to AGPL-restricted or proprietary search solutions. ## Product / Traction The product is a high-performance BM25 extension for Postgres that solves the "relevance gap" in core Postgres search. * **Differentiation:** 4.7x query throughput advantage over ParadeDB/Tantivy and a permissive Postgres license (vs. AGPL), which is a critical requirement for hyperscalers and risk-averse enterprises. * **Traction:** Strong initial community signal with ~190 HN points and immediate open-source contributions. The extension is already integrated into the Tiger Data cloud, providing an immediate path to monetization. * **Moat:** While the founder notes "technical moats are moats no longer," the integration of `pgvectorscale` and `pg_textsearch` creates a powerful, unified "AI-stack-in-a-box" that is difficult to replicate with the same performance profile. ## Competitive Landscape * **Direct Competitors:** ParadeDB (high performance but restricted by AGPL license) and core Postgres `tsvector` (widely available but lacks BM25 ranking and scale). * **Indirect Competitors:** Specialized search engines like Elasticsearch, Algolia, and Pinecone. Tiger Data’s advantage is "zero-ETL" and reduced infra complexity by keeping data within Postgres. * **Incumbent Risk:** Hyperscalers (AWS/GCP) could potentially fork or build their own BM25 implementations, though Tiger Data’s performance lead provides a significant head start. ## Investment Thesis **Bull Case:** 1. **Postgres Dominance:** Postgres is winning the AI database war; Tiger Data owns the two most critical extensions (`pgvectorscale` and `pg_textsearch`) for this era. 2. **Performance & Licensing:** The 4.7x performance lead combined with a permissive license makes this the default choice for every developer building RAG on Postgres. 3. **Elite Velocity:** The team’s ability to ship production-grade database internals at this speed suggests they can out-innovate larger, slower incumbents. **Bear Case:** 1. **Monetization Risk:** Open-source extensions are notoriously difficult to monetize if hyperscalers offer them as a managed service without contributing back. 2. **Commoditization:** As the founder admitted, AI-boosted development lowers the barrier to entry for competitors to build similar high-performance extensions. 3. **Niche Play:** If the market shifts away from hybrid search toward more advanced "long-context" LLMs that don't require RAG, the demand for specialized search extensions could soften. ## Recommended Action **Conduct Deeper Diligence.** We need to clarify the corporate relationship between Tiger Data and Timescale, validate the benchmark claims via independent testing, and assess the conversion rate of open-source users to their managed cloud offering.

Source

Show_hn — View original →