DealForge autonomously sources, scores, and writes investment memos on venture deals. Stop manually hunting.

1,180+ deals tracked  ·  22 AI investment memos  ·  Updated daily

← Back to leaderboard

Vocab extractor for language learners using Stanza and frequency ranks

Show HN: Vocab extractor for language learners using Stanza and frequency ranks

38 AI Score
Show_hn other Added Mar 30, 2026

Details

Sector
other
Total Funding
$0
Last Round
$0

About

I&#x27;m building a Telegram bot to practice Dutch. GPT-4o-mini kept picking vocabulary words I already knew, so I built a classical NLP pipeline to do it instead.<p>It takes a short text + learner level (A0–B1) and returns the best words to study, using Stanza for parsing and corpus frequency ranks (SUBTLEX-NL, srLex, SUBTLEX-US) for scoring. Wins at A1&#x2F;A2, loses at A0 where the LLM picks more obvious words.<p>I also tried adding multi-word phrases (ADJ+NOUN, VERB+NOUN, phrasal verbs) backed by NPMI-scored collocation whitelists. Couldn&#x27;t beat GPT there because it just &quot;knows&quot; which phrases matter.<p>For the phrase work I had to extract collocations from 100M+ OpenSubtitles lines. Published them as a free dataset: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;vladvlasov256&#x2F;opensubs-collocations" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;vladvlasov256&#x2F;opensubs-collo...</a> There are 43K bigrams across English, Dutch, and Serbian.<p>Source <a href="https:&#x2F;&#x2F;github.com&#x2F;vladvlasov256&#x2F;vocab-nlp" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;vladvlasov256&#x2F;vocab-nlp</a>

AI Score Reasoning

The project demonstrates strong technical execution by using classical NLP to solve specific relevance issues in LLM-generated content, but it currently functions as a utility feature rather than a scalable business. While the founder shows impressive data engineering skills, the lack of a defensive moat and the high risk of being 'Sherlocked' by larger language learning platforms or improved LLM prompting limit its venture potential.

Source

Show_hn — View original →