A Star Is a Bookmark
What the league's numbers actually measure — and why you read the tape instead.
Synthesized from a 16-document research corpus across five cascaded domains (15 lenses). The method here is archival, not academic: the standings were re-derived live against the GitHub REST API and the npm registry, and every quoted concession was checked verbatim against the platform's own documentation. Source-reviewed, fact-reviewed, and gap-reviewed; the cross-domain synthesis passed an independent fidelity review before publication.
The league leader has about 239,000 stars. You can look the number up in two seconds — the GitHub API will hand it to you on request — and somewhere between a slice and a flood of the stars across this field are fake. So before we call a single play this season, the only honest question to open on: what does “popular” actually buy you here?
I should say where I’m sitting, because it’s what makes the rest of this fair. This series was produced by managed agents and accepted, line by line, by a human who signs for it — and the human runs the home team. The league leader by stars is a methodology framework called Superpowers, and I run it on my actual Tuesday. That’s a disclosure, not a credential. It means Superpowers gets the hardest calls in this booth, not the friendly ones, and that no verdict all season is going to rest on my own tool’s say-so alone.
So. The standings are up on the screen. Let me tell you which numbers lie, and how.
The funnel nobody scores
Every number you can pull on a tool in this field measures something — just never the thing you came to measure. Walk the funnel from “noticed” down toward “actually used,” and watch each metric quit one step too early (He et al.’s metric taxonomy, anchored to each platform’s own docs):
A star is a bookmark — it counts discovery. A fork is a copy of the repository’s tree — it counts the copy. A download or a marketplace install counts an acquisition. A “dependent” — GitHub’s “Used by” — counts a declaration: someone wrote your package into their manifest.
Four real signals, and not one of them is an invocation. Nothing in that list counts a single time the tool was actually run, let alone whether what it produced was any good. The metric everybody actually wants — does this thing get used, and does its output ship better code — is simply unobservable from the outside. That gap isn’t a footnote on the season. It’s the premise of the whole broadcast, and the closer comes back to stand on it.
You don’t have to take the funnel from me. Take it from GitHub, conceding the point against its own interest. Here is the platform defining what a star is:
You can star repositories and topics to keep track of projects you find interesting.
Keep track of projects you find interesting. A bookmark — the same gesture as a browser favorite, and about as much proof that you ever came back. Then, a few lines down the same documentation page, GitHub turns around and admits what it does with all those bookmarks: “Many of GitHub’s repository rankings depend on the number of stars a repository has.” The platform defines its metric as something you’d never call usage, and ranks the entire ecosystem on it in the same breath. That contradiction — bookmark in one sentence, ranking currency in the next — is the root of every standings lie that follows.
The stat sheet is for sale
It gets worse than imprecise, because a number that drives rankings is a number worth buying.
A purpose-built detector — its authors call it StarScout — swept all of GitHub’s metadata from July 2019 through December 2024 and flagged roughly 6.0 million suspected fake stars (He et al., arXiv:2412.13459). Carry that figure with its version: the paper’s first cut reported 4.5 million across 15,835 repositories, and the current version reports 6.0 million across 18,617 — cite the one you mean. The cleanest slice of it — 18,617 repositories and 301,000 throwaway accounts — accounts for 3.81 million of that total, the part the authors could pin down hardest.
The downloads counter is no firmer. npm tells you so itself: a download count is
simply a count of the number of HTTP 200 responses we served that were tarball files,
and that count deliberately folds in “automated build servers,” “downloads by mirrors,” and “robots,” because, npm concedes, “bot filtering is really hard, and never totally accurate.” The registry’s own warning is that the numbers are “definitely not the same as the number of ‘users.’” And the counter is pumpable on demand: security firm Tenable documented a package shoved past “more than 50,000 downloads in three days” by uploading hundreds of versions, each drawing “between 100 and 150 downloads from automated systems” all by itself. That’s a vendor’s security blog, so weight it as one — but the mechanism it describes is exactly the one npm admits to in its own documentation.
Here is where the booth has to keep its footing, because the easy move from “the scoreboard is partly fake” is to sneer at the whole field, and that move is wrong. The same StarScout study finds that “fake stars represented ≤1% of all GitHub stars monthly” in the years before the 2024 surge. One percent. The median repository’s stars are overwhelmingly real; the rot is concentrated manipulation of specific repos — in July 2024, 16.66% of popular projects showed a fake-star campaign, and 83.90% of the campaign repos had under ten days of activity. Throwaway projects gaming a launch, not the field at large. Two more cautions belong on that number: those base rates come out of the malware literature, where the motive is promoting bad packages, and they should not be ported onto legitimately competitive developer tooling as if they’d been measured there. And stars demonstrably do work as a signal — three out of four developers, one canonical study found, “consider the number of stars before using or contributing to a GitHub project” (Borges & Valente). Which is precisely why faking them pays. The stat sheet is a weak, directional signal. Weak and directional is not the same as noise.
The home team’s number
Now the hardest call, which by house rule goes to my own bench.
Superpowers ranks first by stars on the same API everyone else is measured on — a dated count of 238,845 on 2026-06-25, 239,283 the next day, because these figures only ever drift upward and a bare number goes stale by the weekend. A dedicated tracker, star-history.com, puts it at roughly the same place, around rank #15 globally. Treat that as corroboration of consistency and nothing more: star-history reads the same GitHub star API, so it confirms the count is stable, not that a second independent instrument agrees.
And here’s the part the home team doesn’t get to skip. That 239,000 is a bookmark count. It is standing — first in line at the moment of discovery — and it is not use. Superpowers’ real adoption metric is marketplace installs, and that telemetry is private; a reliable public count of how many people actually run it is, flatly, data not found. The top number in the league attaches to the methodology repo, not to anything that proves the thing gets invoked, let alone that its output is better. If I graded my own tool any softer than that, you’d be right to stop reading.
Two more disciplines the home team’s number teaches, cheap to state and worth keeping all season. First, distrust counts copied from listicles: one SEO roundup lists Superpowers at “40.9k stars,” off by roughly six times from the live API — the cautionary exhibit for why you query the source and never a summary of it. Second, “marquee” is scope-relative, so name your roster before you compare. The adoption bracket — Superpowers, Anthropic’s skills collection, mattpocock’s, wshobson’s agents — is a different field from the spec-driven-tooling baseline (spec-kit near 115,000 stars, OpenSpec near 56,000, BMAD near 49,000). Holding a packaging bundle’s star count up against a single copy-paste instruction file’s is a category error dressed as a standings table.
Read the tape, not the README
That’s the booth’s one rule, and the whole method folds into it. A README tells you what a tool says it does. The record — the live API count paired with a trailing-activity signal, the commit log, the issue thread, the platform’s own concession — tells you what’s true. Where the two disagree, the record wins.
In practice it’s three moves. Pair any star count with a trailing-window commit signal: the stars-versus-activity gap is the cheapest filter there is for separating teams still on the field from trophies on a shelf — it cleanly demotes a repo like contains-studio/agents, 12,390 stars and zero commits in the trailing thirty days, pushed to its final state about ninety minutes after it was created. Match the metric to the question — “still building” is trailing commits, “anyone building on this” is dependents, “actively invoked” has no public answer, so you label it unknown instead of grabbing the nearest proxy. And never let a single number stand on its own; corroborate across the weak signals or don’t cite at all.
What you’ll notice, once you read this way, is the stat that isn’t on the standings. None of these adoption numbers — stars, forks, downloads, dependents — measures whether a tool produces better code; the board ranks the field on discovery and reach, and leaves output quality off the sheet. That gap is the honest premise the season is built on, and it’s the reason a senior in the booth still has a job: when the scoreboard can’t settle it, judgment reading the record is the instrument that can.
Which is what the rest of the broadcast is for. Because under the noise there is a real pattern, and it’s sharp enough to bet on: the plays that survive scrutiny all do a single thing — they get what the model can’t be trusted to hold in its head out of its head and onto something durable. The plan into a file. The verdict into a test that actually ran. The state into a ledger. That move is the next call, and it’s the one play this league runs that earned its standing the hard way.
So read the standings for what they are. The leader’s 239,000 stars are real bookmarks, mostly honestly placed, and they prove that a lot of people found the thing interesting enough to mark — which is worth exactly that and not one inch more. They don’t prove it gets run, and they cannot prove the output is good, because that’s not a stat the standings keep. The scoreboard is partly fake and the field is not a fraud, both at once, and holding those together without flinching is the entire skill. You don’t trust the number on the screen. You pull up the record, and you make the call yourself.
This is the studio open for Color Code — the middle of a three-part arc. Source Code covered the front of the agentic pipeline, getting truth out of a person and into the machine; Object Code covered the back, signing for work the agent handed you. This season sits between them, on how the prompt sequence gets engineered before any of that output exists. Next, the call that earned it: The Plays the Game Wrote.