/

PolyLeviathan · Methodology

The Polymarket
Elo Rating System.

Raw PnL lies. One lucky large bet turns a mediocre trader into a leaderboard name. Elo measures calibration across volume. You cannot fake it over 200 trades.

After rating 1.7 million wallets, we found something the PnL leaderboard hides: the distribution is bimodal. Two clusters, not a bell curve. Persistent winners on one end. Persistent losers on the other. Skill on Polymarket is real and it stays.

View Elo Leaderboard Read the FAQ

Initial findings

The distribution is bimodal. That changes what whale tracking means.

If Polymarket were pure luck, Elo ratings would form a bell curve centred around 1000. Everyone would drift toward average over time. That is not what we found.

Roughly 70% of the 1.7M wallets we've rated sit below 1000. They lose to market consensus more often than they win, and they stay there across hundreds of trades. The winners cluster above 1300 and hold that position over time. The middle is thin. Regression to the mean isn't happening at scale on Polymarket.

This changes how you should read whale data. A $50k bet from a 1400+ Elo wallet is a different signal from the same bet made by a 700-rated wallet. Raw size alone misses that entirely.

Two stable peaks also validate the rating system. If Elo were noisy or arbitrary, you'd see random spread. The bimodal shape means the model is capturing something real: persistent skill differences that don't wash out over time.

Why not just use PnL?

Raw PnL

Misleading

One $500k lucky bet makes a mediocre trader look elite. You cannot tell skill from a single outsized win.

Win Rate

Incomplete

Winning 90% of your bets means nothing if you only bet on markets already priced at 95¢. Win rate ignores the odds you got.

Elo Rating

True skill

Every trade scores against what the market believed at the moment you traded. Win on a tight 50/50 and your rating climbs. Win on a near-certainty and it barely moves. The formula handles the rest automatically.

How it works

Everyone starts at 1000

Fresh wallets open at 1000. Each resolved trade moves that number up or down. Above 1200 is strong. Above 1400 is rare.

The market consensus is your opponent

There is no other trader to face. The market price at the moment you trade is the implied probability you have to beat. Beat it and your Elo rises. Lose to it and it drops.

Variable K-factor

New wallets have a high K-factor. Early trades shift the rating fast, which is correct when the sample is small. Once a wallet has 200+ resolved trades, the K-factor drops and ratings stabilise. You cannot fake it over volume.

Odds are baked in

A win at 52¢ is worth more than a win at 90¢. The expected score formula handles this automatically, with no manual adjustments.

No opponent weighting

In chess, your rating shifts more when you beat a grandmaster. Here, everyone faces the same opponent: market consensus. Ratings are directly comparable across the full 1.7M wallet population.

Updates on every resolution

Ratings recalculate the moment a market resolves. Not in weekly batches. The leaderboard reflects live history at all times.

Rating tiers

What the number means.

Elite1400+

Top 0.1%. Consistently correct on contested markets with enough resolved trades for the rating to be reliable.

Strong1300 – 1400

Top 1-2%. Demonstrably above market across a meaningful sample. Serious traders cluster here.

Above Average1200 – 1300

Winning more often than the market expects. Not dominant, but not random either.

Average1000 – 1200

Trading at roughly the market probability. No strong edge established, or not enough trades to tell yet.

Below AverageBelow 1000

Losing to market consensus more often than winning. May be early-stage with a small sample, or genuinely poor calibration.

Background

Why chess? Why Elo?

Arpad Elo built the rating system in 1960 because chess had the same problem prediction markets have: raw win counts tell you almost nothing about actual skill, because every matchup is different.

Beating Magnus Carlsen means something. Beating a beginner means almost nothing. The Elo formula encodes this mathematically. Over enough games, the ratings converge on something real. Chess pools show the same bimodal shape our Polymarket data does: dedicated players cluster at the top, casual players cluster at the bottom, and the gap between the groups is persistent.

The Polymarket adaptation is direct. The market consensus price at the moment you trade is the implied probability you have to beat. Buy YES at 40¢ and the market resolves YES: the market was wrong and you gain Elo. The market resolves NO: you lose Elo. The K-factor and expected score formula follow the FIDE chess implementation. Opponent weighting gets replaced by market odds weighting.

See where every wallet ranks.

Search any Polygon address to get their Elo rating, rating history, full PnL, and every resolved trade across 1.7M+ tracked wallets.

Open Leaderboard Elo FAQ →

The PolymarketElo Rating System.