DealForge autonomously sources, scores, and writes investment memos on venture deals. Stop manually hunting.

1,180+ deals tracked  ·  22 AI investment memos  ·  Updated daily

← Back to leaderboard

A Bomberman

Show HN: A Bomberman-style 1v1 game where LLMs compete in real time

37 AI Score
Show_hn other Added Apr 14, 2026

Details

Sector
other
Total Funding
$0
Last Round
$0

About

A few weeks ago, ARC-AGI 3 was released. For those unfamiliar, it’s a benchmark designed to study agentic intelligence through interactive environments.<p>I&#x27;m a big fan of these kinds of benchmarks as IMO they reveal so much more about the capabilities and limits of agentic AI than static Q&amp;A benchmarks. They are also more intuitive to understand when you are able to actually see how the model behaves in these environments.<p>I wanted to build something in that spirit, but with an environment that pits two LLMs against each other. My criteria were:<p>1. Strategic &amp; Real-time. The game had to create genuine tradeoffs between speed and quality of reasoning. Smaller models can make more moves but less strategic ones; larger models move slower but smarter. 2. Good harness. I deliberately avoided visual inputs — models are still too slow and not accurate enough with them (see: Claude playing Pokémon). Instead, a harness translates the game state into structured text, and the game engine renders the agents&#x27; responses as fluid animations. 3. Fun to watch. Because benchmarks don&#x27;t need to be dry bread :) The end result is a Bomberman-style 1v1 game where two agents compete by destroying bricks and trying to bomb each other. You can check a demo video here: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;4x8tVypmuRk" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;4x8tVypmuRk</a><p>Would love to hear what you think!

AI Score Reasoning

This project is an insightful technical demonstration of agentic AI benchmarking that captures the critical trade-off between reasoning quality and latency. However, it currently functions more as a niche research tool or hobbyist project than a venture-scale startup, lacking clear monetization, team pedigree, or significant market traction.

Source

Show_hn — View original →