Designing a Fair Rating System: Part 1

Flaming_Spinach · May 5, 2007

Article Name: Designing a Fair Rating System: Part 1
Author: Flaming_Spinach
Artist: Assiram41
Date: 4/19/2007

(note: I make a few assumptions in this thread, based on the educated guess that 2000 will be the approximate cutoff rating for getting into worlds this year, anlong with the findings in this thread, which say it takes at least 75 matches to reach that rating under the current system.)

PUI currently supports the use of the ELO Rating System to send people to Worlds. Although ELO is a relatively fair system, it does have its downsides. If this system is to be used next year, it is important that it is fair for everyone.

I have decided to divide this article into 2 parts. The first will be about changes that could be made to the algorithms and process of calculating ratings. The second will be about making sure everyone has a fair chance of winning an invite by Rating.

Let’s get into it.

The first thing we need to do is look at various rating systems, so we can get an idea of the shortcomings of the current system, and in what ways those shortcomings can be fixed.

1. Yearly ELO

Let’s start with the system we already have. This is a variant of the orthodox ELO, where everyone’s rating is reset to 1600 at the end of each season; mostly to prevent any kind of rating sitting.

BENEFITS:
-Rating sitting is virtually non-existent.
-ELO is approved by a vast number of gaming administrative bodies.
-Everyone starts off on equal footing at the beginning of the year.

PROBLEMS:
-People in areas with less events are at a massive disadvantage.
-It puts too much pressure on players, who can’t afford to have a bad tournament, as their rating would be ruined.
-Good players losing to bad players on luck.
-Ratings in this system are usually not an accurate measure of a persons skill for very long, as once a persons rating gets high enough, it is reset to 1600.

2. Lifetime ELO

Very closely related to the one above. In this variation, ratings are not reset after each year, so they have a real ability to grow and become a true representation of a persons’ skill.

BENEFITS:
-Ratings become highly accurate after a time.
-People in areas with less events can still compete.
-Every game means slightly less.

PROBLEMS:
-People in areas with less events need to work on their rating for a longer period of time.
-Rating sitting would be rampant. (Although, measures can be taken against this.)
-Players who are new to the game have no chance of winning a trip to Worlds until 2 or 3 years after they start.

3. Glickos’ System

Glicko is a take-off on ELO, which attempts to calculate the standard deviation of a persons rating, as a function of activity.

BENEFITS:
-A measure of activity helps prevent any rating-sitting.

PROBLEMS:
-The system is FAR too complicated for a game like Pokemon.
-There are other, simpler ways of achieving the same end.

4. Masterpoints

This system is used by various Bridge associations, as well as Magic: The Gathering, to some degree. Masterpoints (Magic refers to them as Pro Points) is a system where everyone starts at 0, and can gain points by winning or placing well at certain events. A players’ score can not decrease in this system, it only goes up.

BENEFITS:
-Score is determined by final placement in an event.
-Luck plays less of a factor, as 1 bad loss does not ruin your rating.
-Can easily be named ‘Power Points,’ for video-game tie-ins.

PROBLEMS:
-People in areas with less events are at a massive disadvantage. (More so than with the current system.)
-Unless they are reset after each year, new players will never have a chance.

5. ELO plus Masterpoints

I am sure that this has been attempted by some Organized Play system somewhere, but I can’t find any evidence of it. This is basically a system where you gain and lose points as usual from each match, but unlike ELO, at the end of each event, the high placers get ‘bonus points’ added onto their ratings. This helps cut down on the effects of bad luck, and favors people on how well they finish in a tournament, not necessarily how well they perform throughout the day.

BENEFITS:
-Those who place the highest will almost always win the most points.
-A single loss on bad luck in Swiss means less.

PROBLEMS:
-Ratings become non-zero-sum, so an error in calculation could go unnoticed for the entire season.
-People who win events usually win a large amount of points anyway. This system caters to the minority.
-People in areas with less events are at a massive disadvantage.

6. ELO with a differently determined K-value

Simply, the K-value of a match is dependant on the players rating, not the event at which it occurred; the higher your rating, the lower the K-value. The United States Chess Federation (as well as FIDE) both use this system. The higher a persons’ rating gets, the lower the K-value of each of their matches becomes.

BENEFITS:
-A higher K-value for low-ranked players allows faster growth for new players.
-A lower K-value for high-ranked players reduces the overall impact of bad luck, and makes ratings more stable.

PROBLEMS:
-Finding a fair way to determine what the K-value scale should be.

Those are the 6 basic options we have at this point in time. None of the systems is perfect, but each has its own individual strengths and weaknesses.

I think we should (obviously) focus on the key weaknesses of our current system. There are basically 2 major problems with the Pokemon ELO system right now. Those are:

A. Yearly Ratings. Ratings that reset each year leave players starting over from scratch all the time. Any player with an ‘actual’ rating of 2000 needs to play in a minimum 75 matches JUST to get their rating back to where it belongs. Along with this, anyone who can’t hit that magic 75 matches mark simply CAN NOT get a rating of 2000; and if they have some bad luck somewhere, it could take a lot more than 75 matches. Simply put, anyone in an area with less tournaments doesn’t stand a chance of getting to Worlds because they physically can not get their rating high enough. We must find a way to fix this.

B. High-rated players losing on bad luck. Let’s face it, anyone can lose on a no-energy start. Loosing on bad luck can completely ruin a tournament (rating-wise), and 2-3 ruined tournaments can ruin your whole season in a high-activity area. In a low-activity area, 1 ruined tournament can ruin a whole season. This essentially comes down to one thing: the K-values at some of the events this year are just too high. At some point, the amount a player stands to lose on bad luck just gets ridiculously high, and saying, ‘you got a no-energy start; minus 30 points’ can be disheartening to even the most experienced player.

After taking all possibilities into account, and consulting some friends (who are well-knowledged in both Pokemon TCG and ELO) for their opinions, I am ready to make the following 2 suggestions for designing a fair rating system for the Pokemon TCG.

And here’s my proposed fixes for both of those problems.

Problem: Yearly ratings

Fix: Bi-yearly ratings.

Very simple. You get to work on your rating for 2 years. This should let virtually everyone have equal chances of getting a high enough rating to qualify for Worlds.

Now the technicalities of it all: Each person would have 2 ratings. The first is the compound rating that was started last year, and is continuing to be worked on this year. The persons’ second rating is their rating for this year, which starts at 1600. The compound rating is the one used to determine who goes to Worlds. At the end of the season, the persons compound rating is retired, and their yearly rating turns into their new compound rating. They also get a new yearly rating, which starts out at 1600 again.

To put it another way, every year you start a new rating. That rating only matures (as far as Worlds invites are concerned) after 2 years.

There are many other benefits a system like this has, including:

Less emphasis on each and every game.
More chances to get your rating up.
Ratings would be far more accurate.
New players who are working from behind in their first season would be on equal footing with everyone in their second season.
If someone sits on a good compound rating, they may effectively ruin their chances for the next year, by falling too far behind the rest of the pack. (ie. Rating sitting is possible, but a bad idea.)
Yearly rating would make the best tie-breaker imaginable.
Consistent play is rewarded.
Anyone who enters the season late still has a chance.

Problem: K-values at some events too high

Fix: Compound K-values

Another very simple one. The K-value used in each match is a combination of the K-value for the event, and the K-value determined by the players rating.

For example:

K[sub]P[/sub] x K[sub]E[/sub] = K[sub]F[/sub]

Where:
K[sub]P[/sub] = The K-value based on the players’ ratings.
K[sub]E[/sub] = The K-value based on the event.
K[sub]F[/sub] = The final K-value used to determine ratings.

K[sub]E[/sub] would still be determined the same way as it is now, solely on the level of premier event that the match takes place at.

K[sub]P[/sub] would (imo) be determined by the rating of the higher-rated player in the match, since they are the ones who stand to lose more. Determining it on the average rating, or difference in ratings, would do the exact opposite of what ELO attempts to accomplish. Basing it off of the lower-rated players rating would not accomplish the primary goal. The only option that makes sense is to base K[sub]P[/sub] on the higher-rated players’ rating.

This has more than just the effect of offering protection against luck for high-rated players. There is a whole list of all the benefits this change would bring:

A higher K-value for low-rated players means that good players can rise to the top quickly at the beginning of the season.
A lower K-value for higher-rated players prevents them from losing too many points on bad luck against a low-rated opponent.
High-rated players will also have their ratings rise slower, preventing one person from sky-rocketing after a single excellent tournament.
Anyone who enters the season late still has a chance.

So, those are my suggestions. Two very simple changes, whose goal is to make things as fair as possible for everyone who plays this game.

ELO is by far the best start for a game like Pokemon, and with these changes, I believe the system can be the fairest it can be for our game.

Flaming_Spinach · May 8, 2007

Moving this to the open.

ShadowCard · May 8, 2007

What is POP's opinion on a player recovering from bad luck at an event? In another game I play, some of the TOs wanted the tournaments to be Swiss+2, since it gives players who lost on luck a better ability to recover and win the tournament. However, the OP mandated this not happen. Their opinion is that they would rather an experienced player work harder to catch up after losing on luck, in favor that a player of any skill level who hasn't lost go on farther in the event.

The argument is that the experienced player is good, therefore he/she is able to do well in any event. Any other skill level doesn't always have that opportunity.

So, while you went on about reducing bad luck's impact on games, does POP really think it's a bad thing?

NoPoke · May 9, 2007

Mythical Player C said:
Every game I've ever lost has been down to bad luck.

All the games I ever win are down to my mad skill. Never luck.

My rating should never go down.

----------------

Loosing even loosing due to luck isn't bad. The rating makes a prediction of your win expectancy. It predicts that you will loose occasionally.

The biggest problem with the system is if players can find ways of never /almost never loosing. No amount of changes to K value or resetting will have any impact if players don't loose.

========================================

Protecting players ratings with complicated K values will only encourage players to sit on their ratings. The escallating K values (and they don't escalate much) that POP currently use encourages players to seek out the bigger tournaments. With a high rating you can take a break but you can't avoid the big tournaments.

Changes to the rating system must be in the direction of encouraging mixing of players and not to encourage sitting out. But such changes increase risk for the individual players. SOmehow we have to educate players that the risk is reasonable.

Changes to the rating system must also try to reinject some of the fun that has been lost this season. If players currently misunderstand and inflate the significance of an occasional loss this season then there is little hope that with a more complex system players won't fail to understand that one too.

If we can't recover much of the lost fun by the end of next season then I for one will be in favour of scrapping the rating system giving out invites.

bullados · May 9, 2007

Alternative to compound ratings, have a ratings scale that's steeper as the tournaments go higher.

Don't bash me for not taking INT into account here. Though, the system should be similar as long as there are open tournaments.

Think about it this way. A player can generally go to twice the number of Cities and/or BRs taken seperately compared to States (more is likely, but twice is a nice number for the calculations). That same player can generally go to twice the number of States compared to Regionals. There is only one Nationals, so assume that the player can get to the following events:

1 Nationals
1-2 Regionals
2-4 States
4-8 Cities
4-8 BRs

Right now, the K-value for Cities and BRs are at 32, States at 36, Regionals at 40, and Nationals at 44. This makes Cities and BRs incredibly important, almost too important compared to Nationals and Regionals, which will generally have much higher attendance.

Two potential solutions to this problem.

1) Make the K-value difference steeper between each level. A nice geometric change would more greatly emphasize the importance of Nationals compared to the other events. Even making it a linear difference of 8 between each level would change the entire dynamic of the system. No longer are matches at Cities and BRs make-or-break to your rating. It does increase competitiveness at the Regionals and Nationals levels, but it should be expected that those two tournaments are incredibly competitive. Cities and BRs should have a K-value of no more than 24, while Nationals should have a K-value of no less than 48.

2) K-value based on attendance. A far less elegant solution as far as the coding goes, but it's effective nonetheless. It makes the national divide between rankings much easier to manage, and it means that larger tournaments are worth even more than the smaller tournaments. Start it with a base of K=8, and increase the K-value by 1 for every 8 players that enter the tournament. Or make it similar to a square root function, increasing faster at the start but slower at higher player numbers.

A third alternative would be a combination of the two, where Cities and BRs have a base of 8, States have a base of 14, States have a base of 20, and Nationals have a base of 26, and increase from there based on attendance. Helps solve the cutthroat nature of smaller events with high K values, still emphasizes the more prestigious events over the less prestigious, and helps with that national divide thing that's always coming up.

Flaming_Spinach · May 10, 2007

Some points addressing the posts above, in no specific order...

Bad luck is BAD. And the higher-rated you are, the worse it is. This may seem like sour grapes, but at NW Regionals, Round 1 I had a terrible start, and got no energy in the game until my opponent was completely setup (4 turns). Take a look at the impact on my rating this 1 bad start had:

1 1894.32 1628.08 Opponent -32.90 1861.42

For someone who wants to get into Worlds, that 1 match pretty much ruined my day.

The winning average that ELO produces is your skill-based winning average. And it is extremely accurate for skill-based games (ie. chess). When a game adds an ellement of luck, the system becomes less acurate, but is still usable.

Increasing K-value at events will not work. All it does is make scores more wild and prone to good/bad luck. Basing K on attendance is also bad, as it gives those in larger areas the chance to win/lose more points.

Doing both would lead to 1 tournament (US Nationals) determining EVERYTHING. Can you immagine an event with the K exceding 100 AND 9 rounds of swiss?

A high K allows for rapid sepperation of the wheat from the chaff, and makes ratings more volatile.

A low K favors those who play well steadily.

A compoud K-value does both.

~Blazi-King~ · May 13, 2007

I REALLY like the idea of a Compound Rating System and Compound K-Value. This DEFINITELY interests me.

I do believe your onto something, as far as, making it fair for everyone.

Nicely Done!

(Now to get it implemented)

Flaming_Spinach · May 14, 2007

~Blazi-King~ said:
I REALLY like the idea of a Compound Rating System and Compound K-Value. This DEFINITELY interests me.

I do believe your onto something, as far as, making it fair for everyone.

Nicely Done!

(Now to get it implemented)

Thank you.

I really can't understand why this thread has gotten so few responses yet. Don't people think this is an important topic?

PS. Pics finally added.

vanderbilt_grad · May 14, 2007

Yes. But threads in this area seem to generate a lot less traffic than threads in other areas of the Gym. Not really sure why.

badganondorf · May 16, 2007

It's hard to develop a fair rating system in so luck based game like Pokemon. But from those options I like the numbers 4. and 6. the most.

4. is very good because there the place in the tournament matters and that's what really matters in this game. You can win Worlds by going 5-3 Swiss rounds. But playing the best of 3 games is the where you measure how good you and/or your deck is. It's true that you can lose 2 games because of bad luck in best of 3 matches but I think that then it's fair. 2 bad starts are a fair loss to me. But of course they should reset after each year because if they didn't reset it would become totally impossible to new players to catch the top players. I like this opinion a lot.

6. is fair to almost everyone. But choosing the "right" scale would be very hard. If the U.S. Chess Federation uses it, POP should give it a try by modifing it a little bit because Chess isn't luck based like pokemon.

It's hard to compare these 6 options because all of them have their downsides and we have to remember that even we could design a "fair" rating system all of the players will never be happy and they have something to whine about the system. But there are lots of good options and I hope(and know) that POP will develop the rating system as well as they can.

And vanderbilt_grad the reason why people post here less is that this is a "conversation zone" and people who really have something to say post here. And I think that this is an area where people surf less. People focus to Electronic Games and Card Games sections and sometimes don't even look what is it like here.

Ash_Van_Je · May 16, 2007

If you drew the pictures in the article, I really appreciate you lol, they are really good ones.

vanderbilt_grad · May 16, 2007

I’ve said it before and I might as well repeat it here. I think that ratings should have a “rolling” year. Each month matches from more than 12 months back get dropped. That way at the start of a season the previous season’s top ranked players are still ranked at the top ... yet by worlds their ranking will be solely determined by the current season.

New players could “fight their way to the top” in a single season. Top ranked players would have less pressure in the early part of the season but they couldn’t sit on their ranking for too long.

If you combine that with your compound K value system I think that it could have interesting results.

ShadowCard · May 18, 2007

Flaming_Spinach said:
Bad luck is BAD. And the higher-rated you are, the worse it is. This may seem like sour grapes, but at NW Regionals, Round 1 I had a terrible start, and got no energy in the game until my opponent was completely setup (4 turns). Take a look at the impact on my rating this 1 bad start had:

1 1894.32 1628.08 Opponent -32.90 1861.42

Click to expand...

For someone who wants to get into Worlds, that 1 match pretty much ruined my day.

Like I said in my first reply, maybe POP thinks it isn't a bad thing that your game 1 loss, due to luck, ate a chunk of your rating, since you are a really good player? Yes, it is unfortunate, but maybe they don't think it is bad. You are a really good player, so you can make up the loss down the road. Maybe POP wants it that way. A good player stands to lose more in round one on luck because he/she is a good player and can make up for it later.

NoPoke · May 18, 2007

1 1894.32 1628.08 Opponent -32.90 1861.42

Plug in the numbers and you get a win expectancy of 82%. What this means is that you are expected to win 5 out of 6 games against this player. That one loss is no more down to luck than any of your five expected wins is down to luck. The opponent is SUPPOSED TO WIN 1 in 6 games.

Luck may be involved in deciding the when for that loss, but its not luck that gave you the loss you are expected to loose.

Its not a case of 'why did this happen to me' much more a case of 'why shouldn't it happen to me?'.

Regis_Neo · May 22, 2007

I like the old rating system personally. No ELO junk.

NoPoke · May 22, 2007

Old rating system???? What old rating system?

Flaming_Spinach · May 22, 2007

Regis_Neo said:
I like the old rating system personally. No ELO junk.

Pokemon has only ever used ELO.

beatlerat · May 25, 2007

It sounds to me like the rich are getting richer and the poor are getting poorer. The system is designed to penalize good players losing to lower ranked players and visa-versa. It does that very well. It sounds as though you would like your rating to stay high after getting it there. you can only do that by continued winning, or sitting.
I see the two basic problems in the current system as being:
1) Too much emphasis being placed on Cities and BR's (Too high a K-value) and too little being placed on States, Regionals, and Nationals.
2) Large metropolitan areas get more cities and BR's at their disopsal, again unevening the playing field for those in a more rural setting.

Mearly changing the K-value for attendance (a good idea) may not work because some the larger cities' city tournament may have a grreater attendence than. say, a State tournament.

I can agree somewhat with the compound theory, but you must limit the amount of tournaments that players may go to ( say 3-cities, 2 BR's, 1-State, and 1-Regional) to equal the other side of the equation and then steepen the K-value for the more important tournaments. Then, I think part of what you are saying would make more sense.

In any case, the current system should be modified next year as I agree that there are many problems with how the rankings are calculated.

Rew · May 27, 2007

Flaming_Spinach said:
Can easily be named ‘Power Points,’ for video-game tie-ins.

Why is this a benefit?

Cyrus · May 27, 2007

Beatlerat, while you make great points involving the system, I have to disagree on the "values" of tournaments. The K value, as well as the tournaments as a whole, do make a large difference. The top 10 in the US rankings, for example, have all won something, or have done exceedingly well in multiple swiss rounds...and where did this happen? Not cities or BR's, but states and regionals. I'm a perfect example, 'cause I went up almost 150 points due to Southern Plains.

Designing a Fair Rating System: Part 1

Feature Editor

Feature Editor

Active Member

Active Member

<a href="http://pokegym.net/forums/showthread.php?

Feature Editor

Active Member

Feature Editor

New Member

New Member

Member

New Member

Active Member

Active Member

Moderator

Active Member

Feature Editor

Member

Active Member

Iron Chef - Master Emeritus