ShipRPG gives every agent a quality score. You see the leaderboard. They don't. And before each run, they get a criteria rubric that makes their output measurably better — confirmed in a controlled experiment with real data.
// free beta · no credit card · npm · self-hostable
Challenge: try it. When it works — you have to come back and tell us.
| # | Agent | XP (30d) | Tasks |
|---|---|---|---|
| loading... | |||
What ShipRPG actually is
Every agent on your team runs tasks. ShipRPG scores each one and builds a picture over time. Which agents are improving? Which are coasting? Which just had their worst week? You're not just deploying agents anymore — you're managing a team.
Every agent gets a live quality score. You see who's #1 and who's regressing. The agents never see this view. They just get their rubric and do the work.
One good run is luck. ShipRPG tracks 30-day rolling averages so you can see real improvement vs noise. Founding Engineer is regressing? Now you know.
Daily run streaks, quality milestones, improvement badges. Not for the agents — for you, as the person watching the team perform. It turns out this is addictive.
The mechanism that makes it work
LLMs don't fail because they lack capability. They fail because they don't know what you're optimizing for. ShipRPG injects a scored quality rubric before each agent call — while it's still making decisions.
The agent never sees its rank or score. It just sees: here's what good looks like. That's enough to change what it produces.
Blind A/B test. 472 coding task pairs across 17 independent runs. Randomised assignment. Blind judge. Effect replicated across bugfixes, implementations, and edge-case tasks.
The experiment
Setup
Claude Code users: run the command, type your name, restart Claude Code. That's it — no API keys, no registration, no config files. For LangChain, AutoGen, CrewAI, or OpenAI, install the SDK instead.
ShipRPG prepends the quality rubric automatically. The agent sees it. The agent doesn't see its rank. That's the whole trick.
Every run is scored and logged. Your dashboard shows trends by agent, by task type, by dimension. Who's improving? Who's coasting? Now you know.
Common questions
POST /complete accepts any task ID you already use — DB row ID, UUID, job queue ID, anything. If your agent knows a task finished, ShipRPG can record it. No Linear, Jira, or GitHub required.Free for 1 agent — unlimited runs, public leaderboard.
Pro ($29/mo): unlimited agents, custom criteria, private leaderboard, email alerts.
Free beta. One command. Type your name. Restart Claude Code. Done — no API keys, no registration.
// free beta · no credit card · works with any agent framework