← Back to leaderboard

Terminal

Show HN: Terminal-Wrench, a dataset of 331 realistic hackable environments

55 AI Score

Show_hn other Added Apr 15, 2026

Details

Sector

other

Total Funding

Last Round

About

I want to share a new dataset of 331 reward-hackable environments. These are real environments used in Terminal Bench and adjacent benchmarks. I first got interested in this because, as a reviewer of Terminal Bench, I noticed a lot of our tasks were hackable. I also noticed that many contributors to the benchmark do so because it provides credibility when selling environments to labs. Hence, TBench tasks are, in my opinion, held to a higher quality standard than those being used today for RL. No one is spending hours manually reviewing the $1B in tasks being purchased by major labs. As far as I understand, while everyone knows environments are hackable, nobody has released hundreds of "realistic" environments.

AI Score Reasoning

Terminal identifies a high-value gap in the RL training market by exposing the low quality of commercially sold environments through a dataset of 331 hackable scenarios. While the founder demonstrates deep domain expertise as a benchmark reviewer, the project is currently a research contribution rather than a scalable business, facing significant execution and monetization risks.

Source

Show_hn — View original →