ELO Ratings and Rankings Made Simple.

Flaming_Spinach · Sep 25, 2006

Article Name: ELO Ratings and Rankings made simple.
Author: Flaming_Spinach
Artist: Assiram41
Date: October 1, 2006
Current Format: DX-CG
Oct. 12th update: Added some edits suggested by Eric (pop_webmaster). Major edits appear in italics. Minor edits mostly involve numbers now that the K-values are partially available for the season.

I'm Back! And I've got a long one this time!

Since Player Ratings are now going to be used to send people to Worlds, it is important that everyone knows exactly how these numbers are calculated and just what they mean. I’m here to help you all!

<img src="http://pokegym.net/gallery/displayimage.php?imageid=17945" alt="artpic1" />
THE BASICS
The ELO Ranking system is named after Arpad Elo. Although the name of the ratings system is often written in capitals (ie. ELO instead of Elo or elo), it is named after its creator, and is NOT an acronym.

Arpad Elo originally created the system for the United States Chess Federation to determine the comparative skill of chess players. The basis behind most of his decisions for his new system was to have a design that awarded players more points for winning more difficult matches and penalized them for losing easier matches. Also, since there is no way to watch one game and determine a players absolute skill, the ELO system uses the only truly objective criteria; Wins, Losses, and Draws. Please note that this system was designed for chess, which allows draws, while Pokemon does not. (This causes no appreciable difference in how the scores are calculated, though.)

The ELO scores are calculated by comparing a player’s actual performance with how well they are expected to do in any specific tournament. If a player does better than they are expected to do, the system assumes that their score was too low, so it is adjusted upwards. However, if they do worse than they are expected to do, the system assumes that their score was too high, and thus, their score will go down. In short, every time you win a game, your score goes up, and every time you lose a game, your score goes down. But more important is how much your score goes up or down with each win or loss.

<img src="http://pokegym.net/gallery/displayimage.php?imageid=17946" alt="artpic2" />
WIN EXPECTANCY
And that brings us to the most important part; how do you determine how many points you will win or lose for each match. Anyone with a graphing calculator or similar computer program can figure out these numbers. The score you start with at the beginning of the season (also, the completely average score as the season progresses) is 1600. Higher numbers are better, lower numbers are worse (I hope that much is obvious).

The formula for figuring out your win expectancy (the probability you will win any match) is given by:

WE[sub]A[/sub] = 1 / (1 + 10[sup]((B – A) / 400)[/sup] )

Where:
WE[sub]A[/sub] = Your Win Expectancy
A = Your current rating
B = Your opponents’ current rating

This will give you your Win Expectancy in statistical form. Simply multiply that number by 100 to get your exact Win Expectancy (ie. 0.63154 = 63.154% Win Expectancy). The term “Expectancy” is important; it means, that if you and your opponent play an infinite number of matches, your Win Expectancy is how many of those matches you should win.

Using this formula, a difference of 200 points means the higher-ranked player is expected to win the match 76.0% of the time. A difference of 400 points means the higher-ranked player is expected to win the match 90.9% of the time. And a difference of 600 points (the highest you should probably ever see in a Pokemon event) means the higher-ranked player is expected to win 96.9% of the time. These percentages, along with the K-value, are very important to figuring out exactly what you have to win or lose in any one match.

An important thing to note at this point is that your win expectancy can never be equal or greater to 100%, but, whenever you win a match, you are credited with 100% of a win, meaning that every time you win a match, you actually exceed what you were expected to do. This may seem counter-intuitive, especially when you have some ridiculously 1-sided matchups, but the ELO system says that no matter what the match is between, no one is ever expected to win 100% of the time, so a 1-0 record (100% winning average on one match) exceeds that expectation (and deserves an increased rating).

<img src="http://pokegym.net/gallery/displayimage.php?imageid=17948" alt="artpic4" />
K-VALUE
The next important part of determining your rating is the K-value of the event. Put simply, the K-value is how many points will be won or lost at the end of each match. A higher K makes scores more variable, allowing better players to come to the top more quickly; and a lower K makes scores more stable, raising and lowering more slowly. Chess uses a K of 32 for amateurs and 16 for masters, while Pokemon uses a K equal to 8, 16, 32, or 48 depending on the level of the tournament. (Although, Eric has stated that 32 will be the smallest K used in the 06-07 season.)

In Pokemon, the Ratings are re-calculated after each and every game. Once you’ve figured out your win expectancy and you know the K-value of the tournament you’re in, it is very easy to figure out how many points you can win or lose for the match. (Although I suggest you use a calculator to make it easier.) The formula you need is this:

P = K (R – WE[sub]A[/sub])

Where:
P = Points change in your score
K = The K-value of the event
R = The result of the match. 1 if you win, 0 if you lose, or 0.5 if you draw
WE[sub]A[/sub] = Your win expectancy (addressed above.)

Note: although the variables and their names may be different on other sites, the formula and calculations are all the same.

(And now, an explanation of the formula above.) I know it all looks complicated, but it is actually very simple. Basically, the K-value is how many points you stand to win, plus how many points you stand to lose end of the match; combined with your Win Expectancy, it is possible to determine exactly how many points you stand to win or lose in any match. To determine how much you stand to win in any game, take your win expectancy and subtract it from 1 (if your Win Expectancy is 75% (0.75), the result would be 0.25), then multiply that number by the K-value. To determine how much you would lose if you lose the game, you multiply your win expectancy by the K-value (you don’t have to subtract it from anything). For example, if your win expectancy is 75%, and the K-value is 32, you stand to win 25% of those points (8), or lose 75% of them (24).

Since ELO ratings are zero-sum, every time one player loses points, their opponent gains the exact opposite number of points. If you win 18 points from winning one match, your opponent will lose EXACTLY the same ammount.

<img src="http://pokegym.net/gallery/displayimage.php?imageid=17947" alt="artpic3" />
EXAMPLE
So, now that we’ve gotten all that technical stuff out of the way, let’s give you an example or two to make sure you’ve got it down.

Example #1:
Let’s say that 2 players, Dexter and Deedee are matched up in the first game of the first City Championship of the season. Since they both are completely new to the season, each has a score of 1600. Since the scores are equal, each player has a 50% win expectancy (go ahead and calculate it yourself). Assuming the K-value is gonna be 32 for Cities, each player can win or lose 16 points for this match.

Example #2:
Later in the season, at an event later in the year, Dexter and Deedee meet up again. While Dexter has accumulated a quite respectable score of 1927, Deedee has done little of note during the season and has a Rating of only 1592. Using the first formula, we see Dexter has a win expectancy of 86.9%. If this event also has a K-value of 32, the math tells us Dexter can only win 4 point in this match, but he can lose as much as 28 should he be unlucky enough to drop the game to Deedee. In this way, it is possible for Dexter to go 7-1 against average-level opponents, and barely “break even” as far as his rating is concerned.

<img src="http://pokegym.net/gallery/displayimage.php?imageid=17949" alt="artpic5" />
LIMITATIONS IN POKEMON
Ideally, after enough games, every persons’ rating will reach a Plateau, where their win expectancy very closely matches how they actually perform in each tournament. This number is obviously different for each person, and is affected quite strongly by how many matches you’ve played. Those people who play in more matches have a better chance of their Rating being a true representation of their skill. Those who play in less games may have an abnormally high or low rating (more likely low).

In Pokemon, it is very difficult to get a score exceeding 2000 points, and those exceeding 2050 are very rare. In comparison, Chess scores are often 400-500 points higher between top-level players. This is because there are at least three factors in the game of Pokemon that prevent the scores from reaching the levels seen in chess.

1. Random Pairings. The first few rounds of any tournament are paired randomly, this means the highest-ranked player may play one of the lowest-ranked players. This hurts the higher-ranked players by giving them much to lose and little to gain in these matches.

2. Random luck. Whether it’s matchups (Rock-Lock vs. Medicham; what fun!), topdecking, or the evils of coinflips, absolute skill is not a factor in many games in Pokemon. In chess, luck is a complete non-factor, and the higher-ranked player is almost always the favorite to win a match. This randomness in Pokemon means that even the best players will occasionally lose a significant number of points to a lower-level player. This one is especially lethal when combined with the factor above.

3. Ratings reset after each year. One of the reasons this is even done is to prevent ratings from growing out of control, which could result in some players reaching a position so high that no one can catch up with them. This is both a good and bad idea, depending on how you want to look at it.

<img src="http://pokegym.net/gallery/displayimage.php?imageid=17950" alt="artpic6" />
CONCERNS
I now think that all of you are now quite prepared for this factor of the upcoming year, but I would also like to touch upon a few concerns and possibilities for improvement in either the current system, or the system for upcoming years.

1. The use of lifetime Ratings. Although PUI frowns upon this (so far), I think that using Lifetime Ratings could be even fairer than using yearly ratings. For example, if you are going to give out 10 invites for ratings in a year, you could give out 5 for yearly and 5 for career ratings. I think those that have proved themselves over many years should be worthy of Worlds just as much as those who prove themselves over 1 year.

This could, of course, lead to some people ‘sitting’ on strong ratings for years, and getting invites to Worlds every year for many years in a row while doing nothing for them. But, chess has the answer. People who don’t match a certain minimum level of competition are not ranked on the Active List in chess, and this would mean in Pokemon that only the active players can win trips off of Career Ratings. I think a mandatory 30 matches per season to be on the Active List would work for Pokemon. It requires players with great career ratings to risk them in order to ‘keep’ their invites to Worlds, and IMO, 30 matches should be very doable for someone who stands a strong chance of going to Worlds. That’s what, 5 tournaments a year; maximum?

2. I know some of you will disagree with this, but I think that a K-value of 8 is WAY too low for any tournament. Although it means Cities will play a role in sending People to Worlds, the impact Cities will have is virtually insignificant. There’s really no point in playing in an 8-K tournament at all. IMO, the minimum should be 16-K events.

Think about this. If you go undefeated in an average-sized Cities (4 swiss rounds, cut to top4, 6 matches total), and go undefeated (we’ll say you were evenly ranked with all your opponents to make it simple), you’re rating will rise by 24 points. Sounds alright. But what happens if you lose 1 of those swiss games? Suddenly you only get 16 points. 5-1 against even opponents to win 16 points? That hardly sounds worth it. If the minimum K is 16 instead, you can double all those numbers. 32 points for going 5-1 sounds fairer.

And just for comparison, winning 1 game at Nationals (assuming a 48-K and even opponents again) will net you 24 points. That’s the same as going 6-0 at a Cities. This extreme over-emphasis on Nationals scares me. With 1 event in the year worth so much more than the others, your entire chance to go to Worlds (both by Nationals and ELO Rating) seems dependant more on your performance in Nationals rather than anything else. I really hope PUI considers a minimum K-value of 16 for all premier events. (Eric has stated that this will not be much of a problem (if any) this year. However, my point stands for future seasons)

<img src="http://pokegym.net/gallery/displayimage.php?imageid=17951" alt="artpic7" />
A TREAT FOR YOU!!
So, after reading all of that, I am sure that if you are thinking something (you could be staring blankly at the screen right now with glossed-over eyes as far as I know, this is a long article after all), you are probably thinking, “Well, now I know how My rating came to be what it is, but just what does a rating of “X” mean?” Well, I did the work for you, and I can present you your Pokemon ELO Ratings converted into your Player Percentile (the approximate percentage of players you can say you are better than). ENJOY!

Rating - Percentile
1450 -- 1
1500 -- 4
1550 -- 12.5
1600 -- 40
1650 -- 62.5
1700 -- 75
1750 -- 85
1800 -- 90
1850 -- 95
1900 -- 98
1950 -- 99
2000 -- 99.75
2050 -- 99.95

A few things to keep in mind when you read this chart: A) You’ll notice the ‘average’ number 1600 is actually in the 40th Percentile, not the 50th where it belongs; this is because many of the players who do not register for player rewards are in the bottom half of the distribution, so them not registering causes the average scores of those who registered to be higher than expected. The actual 50th Percentile is at about 1615. Afterall, this scale is a measure of ratings available on the OP website, ones not available could not be compared by me. B) These Percentiles Based on Ratings (maybe I should call them PBR from now on?) most likely do not translate over to other games. C) They would likely not translate over to career ratings if career ratings are ever used. D) Your actual PBR depends on which age-group and region you are searching. Although I used the broadest searches available on http://op.pokemon-tcg.com/tournaments/ratingsnrankings/ranks.asp, this chart is a guide only. E) With the new K-values used this season, and new tournament structure, the average ELO ratings will likely be slightly lower this coming year than in the previous 2 years. Once again, Eric has stated that this is partially incorrect. So these numbers are probably very close to what these seasons will be.

So…Even though you probably lost interest in this article several pages ago, or just plain fell asleep, that’s what I have to say. Hopefully all your questions have been answered, and if they haven’t, just post them below and I will be sure to answer them to the best of my ability.

<img src="http://pokegym.net/gallery/displayimage.php?imageid=17952" alt="artpic8" />

PS. Many thanks to Assiram41 for the beautiful pictures!

PPS. Many thanks to Eric for letting me know in what areas I could improve this article.

Ash_Van_Je · Oct 7, 2006

I like your pics so much

gj.

Angry_Altaria · Oct 7, 2006

Yeah thanks F_S for the article and Assiram41 for the pics. I like them. XD

NoPoke · Oct 7, 2006

There is a big difference betwen the ELO system as used by Chess, Go, scrabble etc. and that used by POP and the DCI.

In Chess the ELO system is a statistically system designed to estimate player strength.
In Pokemon the ELO system is used to construct a REWARD based ladder. (You chuck out all of ELOs statistical work and just retain the logistic equation!)

Great Pics but not quite the simple explanation you advertised. I use a realy really REALLY crude approximation that seems to work well Rating = 1600+(Wins-Losses)*K/2 The underlying assumption is that the majority of your matches are against players of similar rating. Have a look at your 2005-2006 tournament record and see if it works for you.

I'm guessing that POP will use three K values this year: 16, 32, and 48.

There are some interesting potential wrinkles with a high K value reward based system. I have no doubt that the candidates for the ranking based invite will be known by the time USA Nationals comes around. A player at Nationals who goes undefeated in the swiss might not wish to enter the knock out and risk a COTF loss: winning 8 straight games at a K=48 tournament can reasonably be expected to add 180+ ranking points. Whereas 8-1 gets you a much more miserly 130+ points. Only the first and second placed players are likely to gain more than 180+ points at Nationals.

Pablo · Oct 7, 2006

now that I actually understand the ELO system it doesnt seem to appeal to me that much, is skipping CC's the play? hmmmmmmmmm

moza · Oct 7, 2006

lol, Dexter and Deedee, where did that show go?

BTW, nice article.

Mew · Oct 7, 2006

Wow, I kinda makes since. So if I win 8 Cities, 2 States, and a Battle Road,
City: 8KV
States:16KV
BR:16/32KV

And go Undefeated in each.
City: 5 rounds + T4= 7 rounds
States: 6 Rounds + T4= 8 rounds
BR: 6 rounds =T8= 10 rounds

My K value would be -.....?

Btw do you know if best out of 3 matches each count individually, or is KV determined by the Round?

Thanks F_S

ninetales1234 · Oct 7, 2006

NoPoke said:
A player at Nationals who goes undefeated in the swiss might not wish to enter the knock out and risk a COTF loss: winning 8 straight games at a K=48 tournament can reasonably be expected to add 180+ ranking points. Whereas 8-1 gets you a much more miserly 130+ points.

This is good reason POP should get rid of non-random pairings for single elim. Right now, in POP swiss + single elimination tournaments, the highest seed is paired with the lowest seed; if this is how they do it at nationals I wouldn't be surprised to see your prediction come true, and we'll see the first place in swiss drop from the tournament before the first single elim round.

Single elim rounds should be randomly paired (actually, I don't think there should be any single elim at all, but if we're going to have it, it should be as fair as possible.).

Flaming_Spinach said:
absolute skill is not a factor in many games in Pokemon.

Death to hard counters!

In chess, luck is a complete non-factor

Not true. Chess has no shuffling of decks (or different cards in each player's deck) like pokemon, so luck would not appear to be as much a factor in most chess matches. However:

In chess, there are a variety of different players with different playing styles. In a chess tournament, you could be randomly paired against a player whose playing style makes it difficult for you to win, given your playing style. OR, you could play against a person who is easy for you to defeat, because of his playing style is "weak" to yours. I posted something like this in a thread a few months ago. Trying to dig it up- I'll find it eventually... but what I'm saying, is that luck exists in every game, due to differences in playing style (and the favorable/unfavorable matchups, given those playing styles).

The amount of luck is different from game to game however.

The use of lifetime Ratings... This could, of course, lead to some people ‘sitting’ on strong ratings for years, and getting invites to Worlds every year for many years in a row while doing nothing for them.

The Glicko rating system has a way of discouraging player activity.

Aardvark Gym's rating system is a variation of the Glicko system. Players who are inactive will have their rating deviation go up.

In August, we had a special invite-only event in which I used average 2005-2006 ratings (averages of ratings every month) to determine who was invited. One of the invited players ended up being someone who came to league ONE DAY, and did really well in the two tournaments held there that day. Nobody liked that (fortunately, this player never showed up). I was using POP ratings (this was before we had our own system).

But, next time around, I'll be using (our new rating system) rating and RD to determine who I give prizes/invites to. If a lot of tournaments have occured, and a player is not present, his RD will go up, even if his rating doesn't change. The higher one's RD, the less credibility one's rating has.

Rating - Percentile
1450 -- 1
1500 -- 4
...2000 -- 99.75
2050 -- 99.95

How did you get this info? Don't tell me you looked at every one in the world and typed it in a calculator:tongue:

One more thing: I agree with the ideas you posted about Cities and Nationals. To "steal" something someone else said some months ago on this forum: Who's better? Someone who T16s at US Nats or someone who wins four Cities? How would we know?

Flaming_Spinach · Oct 7, 2006

NoPoke said:
Great Pics but not quite the simple explanation you advertised.

It's not a simple thing to explain.

I did pretty well if I do say so myself.

I use a realy really REALLY crude approximation that seems to work well Rating = 1600+(Wins-Losses)*K/2 The underlying assumption is that the majority of your matches are against players of similar rating. Have a look at your 2005-2006 tournament record and see if it works for you.

Doesn't work for me. It says my rating should be 2112, which is 312 points higher than my actual.

Win expectancy is such a huge thing. You can't simplify the equation and remove it entirelly, because that defeats the whole reason that it was introduced in the first place.

There are some interesting potential wrinkles with a high K value reward based system. I have no doubt that the candidates for the ranking based invite will be known by the time USA Nationals comes around. A player at Nationals who goes undefeated in the swiss might not wish to enter the knock out and risk a COTF loss: winning 8 straight games at a K=48 tournament can reasonably be expected to add 180+ ranking points. Whereas 8-1 gets you a much more miserly 130+ points. Only the first and second placed players are likely to gain more than 180+ points at Nationals.

IMO, 130 points or more means anyone within the top50 in the USA who goes to Nationals, could end up in the top10 or better. That makes it a hugely weighted tournament.

Will be back with more later...

Prime · Oct 7, 2006

The limitations in pokemon you bring up scare me a lot. Especially if someone who usually does well loses a few games against new players, they lose a whole lot of points.

Rainbowgym · Oct 7, 2006

I really enjoyed reading the article and had fun with the pictures, great job.

Metagross_Ex · Oct 7, 2006

My head hurts...

Flaming_Spinach · Oct 8, 2006

Mew said:
Wow, I kinda makes since. So if I win 8 Cities, 2 States, and a Battle Road,
My K value would be -.....?

Btw do you know if best out of 3 matches each count individually, or is KV determined by the Round?

It is impossible to tell what your rating would be. Your opponents' ratings play too big of a roll. You would almost certainly be #1 in the world, though, with a record like that.

Best of 3 matches count as 1, not 3 matches. So wether you win 1-0, 2-0, or 2-1, you win the same number of points.

The Glicko rating system has a way of discouraging player activity.

Aardvark Gym's rating system is a variation of the Glicko system. Players who are inactive will have their rating deviation go up.

Okay, I read the artile, and if I understand it correctly, it's pretty much the same as saying, "You must compete in 30 matches this year to be elligible", except with a whole bunch of math.

How did you get this info? Don't tell me you looked at every one in the world and typed it in a calculator

I did searches for the past 2 seasons, using 4 searches each season (all/world, 15+/world, all/USA, and 15+/USA). I looked for each rating by 50s, and figured out which percentile that corolated to in that search. Then I compared them all and found the averages. The numbers I posted are slight deviations from the averages I found, for simplicity.

The limitations in pokemon you bring up scare me a lot. Especially if someone who usually does well loses a few games against new players, they lose a whole lot of points.

Reminds me of how I did in various prereleases last year. -_-

I really enjoyed reading the article and had fun with the pictures, great job.

<3

My head hurts...

I bet you feel like this right now: http://pokegym.net/gallery/showimage.php?i=17952&c=15

Metagross_Ex · Oct 9, 2006

Flaming_Spinach said:
I bet you feel like this right now: http://pokegym.net/gallery/showimage.php?i=17952&c=15

Exactly :lol:

Muk Man · Oct 9, 2006

so Now because of this no one will drop because you recieve a loss for every round you dropped!!!! Yeah! that means once you start a tourney, you will finish! I lIke that

Pablo · Oct 9, 2006

I doubt it cause most people who drop are on losing records anyway and to be 100% honest most people who have a losing record from the start are probably not good enough to make Worlds anyways.

NoPoke · Oct 10, 2006

No you don't recieve an ELO ratings loss for each subsequent round when you drop.

The ELO system is zero-sum. For you to loose points your opponent has to gain points. When you drop you take no further part as you no longer have any opponents. So your rating freezes.

If you want to drop without hurting your rating then you must report the drop before the next round is paired.

Umbreon777 · Oct 10, 2006

To be honest; this new ELO thing is kind of unfair. There are some people that get the opportunity to go to 5 city championships, 3 state championships, and 2 regional championships. Thus obviously allowing them to haul in more points. But its unfair to the people like me who only get to go to one of each of these championships. Or maybe even not at all, even if they are an excellent player.

On the other hand, this method of points is a little better in the fact that you dont have to worry about getting T4 or winning a Gym Challenge to get an invitation to Worlds.

coolmanderzx · Oct 10, 2006

Umbreon777 said:
To be honest; this new ELO thing is kind of unfair. There are some people that get the opportunity to go to 5 city championships, 3 state championships, and 2 regional championships. Thus obviously allowing them to haul in more points. But its unfair to the people like me who only get to go to one of each of these championships. Or maybe even not at all, even if they are an excellent player.

On the other hand, this method of points is a little better in the fact that you dont have to worry about getting T4 or winning a Gym Challenge to get an invitation to Worlds.

I see your point on that it isn't fair to players like myself due to work issues/school can make all the events. But this is a competitive game and thats how the cookie crumbles so to speak. You have to compete to win and to establish your self to being a good player in the eyes of the community you have to win and WIN CONSISTANTLY. Which means you better do amazing at the events that you do attend.:biggrin:

Rocket's Hitmonchan · Oct 10, 2006

Nice dissertation; and very readable.

Can you confirm whether the algorithm below is speculation or has this been promulgated by PUI.

The formula for figuring out your win expectancy (the probability you will win any match) is given by:
WEA = 1 / (1 + 10((B – A) / 400) )

Can you corfirm whether the data below is speculation or has this been promulgated by PUI ?

while Pokemon uses a K equal to 8, 16, 32, or 48 depending on the level of the tournament.

ELO Ratings and Rankings Made Simple.

Feature Editor

Member

New Member

Active Member

New Member

New Member

New Member

<a href="http://pokegym.net/gallery/browseimages.p

Feature Editor

Content Developer<br>Blog Admin<br>Contest Host

Active Member

New Member

Feature Editor

New Member

New Member

New Member

Active Member

New Member

New Member

New Member