Why we use Bayesian shrinkage (and…

Counterwatch and Blizzard's official rates page both publish Overwatch win rates, and they do the math differently. We shrink ours before ranking heroes. Blizzard publishes the raw numbers as they are. Neither is wrong. They answer different questions, and the gap matters most for the heroes nobody picks much.

Two sources, two jobs

Since Blizzard's rates page went up in late 2025, you have had two ways to look at community hero performance. Blizzard's page gives you raw, unmirrored win rates and global pickrates for the 5V5 ladder, partitioned by rank, exactly what the servers recorded with no smoothing. Counterwatch gives you shrunk win rates across 5V5, 6V6, Stadium, and Marvel Rivals, with Bayesian shrinkage run on every tier list row before it ranks anything.

Both are correct. If you want the plain aggregate of how often a hero won across the recorded matches, Blizzard answers that. If you want a read on how strong a hero actually is, and whether you should pick them, shrinkage answers that.

The small-sample problem

Raw win rates fall apart on small samples. Take a hero with five tracked matches this week, all wins. The raw rate is 100%, and any list sorted by raw win rate drops them straight at the top. Five matches is nothing, though. A few games later that number is already sliding back toward the middle, because that is what regression to the mean does.

This is not specific to Overwatch. It turns up anywhere sample sizes vary between the things you rank, batters, product reviews, restaurants, hero win rates. A 100% rate over 5 games is not the same claim as a 55% rate over 10,000, even though the first number is bigger. Sort by the raw number and your S-tier is just whoever got lucky this week.

What shrinkage does

Shrinkage blends each hero's raw win rate with a neutral 50% baseline, weighted by how many matches sit behind it. Thin samples get pulled most of the way to 50%. Heavy samples barely move. We add 400 imaginary coin-flip matches to every hero's record before computing the displayed rate:

shrunk = (rawWinRate × matches + 0.5 × 400) / (matches + 400)

5 games at 100% raw becomes (5 × 1.0 + 400 × 0.5) / (5 + 400), which is 50.6%. The fluke collapses to baseline. No S-tier.
10,000 games at 55% raw becomes (10,000 × 0.55 + 400 × 0.5) / 10,400, which is 54.8%. Real signal barely moves.

A hero sitting at 60% over 20 matches ends up reading like a 50.9% hero once shrinkage runs. That is the list refusing to be fooled by 20 lucky games.

Why Blizzard probably leaves the numbers raw

There is a good reason to report raw. When you are Blizzard, publishing official stats from your own servers, "report what happened, do not filter it" is a sound principle. If a hero won 60% across the community, that is a fact about the game, and people should be able to see it without a third party deciding what to smooth. The rank partitioning helps too: a Masters-only slice is cleaner than an all-ranks blend, so each filter already contains the sample problem somewhat. That is a reasonable choice. It is just a different one from ours.

Why we shrink

The Counterwatch tier list has a different job. It answers which heroes are worth picking right now, and ranking questions care about sample confidence in a way "what happened on the servers" does not. A list that calls a hero S-tier off 15 matches is worse than no list, because it tells you the hero is strong when the number is a coin flip. Shrinkage lets us show every hero, including the rarely-picked ones, without floating random outliers into S-tier. Per-rank filters make this sharper still: on a thin-population rank like Bronze, shrinkage holds small-sample heroes near 50% until the data catches up instead of letting them swing to the top or bottom on noise.

What it means when the two disagree

Where Blizzard's page and ours disagree on a hero, it is almost always sample size. If Blizzard shows 58% and we show 52%, the gap is the shrinkage telling you the 58% rides on a sample we do not fully trust yet. If the two land within a point of each other, the sample is big and the number is solid.

A workable way to use both: climb on the Counterwatch tier list, since the shrunk numbers are the ones to pick on. If a hero looks surprisingly high or low, open their hero page and check the match count to see how firm the placement is. And sanity-check against Blizzard's raw number when you want to, where a big gap means low sample and tight agreement means the edge is real. The methodology page goes deeper on the formula and the prior, and the Counterwatch app runs the shrunk numbers live in your match.

Why we use Bayesian shrinkage (and Blizzard's rates page doesn't)