Elo, Glicko and Trueskill ratings on replays.springrts.com

dansan · Post by **dansan** » 05 Oct 2012, 00:36

Though I told everyone I don't have time, I couldn't restrain myself, and so... behold:
The Hall of Fame, and individual ratings views on players pages and the rating after each match.

Ratings are calculated separately for each game (BA, NOTA, XTA etc) and for 1v1, Team, FFA and TeamFFA.

Elo and Glicko v1 are used only for 1v1. TrueSkill is used for all 4 types.

IMO some factor is making the Elo vary to little... help adjusting values is much appreciated.

The site urgently needs support for proper smurf/account joining!

I'm offline for the weekend.

PS: The initial calculations (all replays that were uploaded until now) were made in chronological order. All replays that get uploaded now (99.9% automatically from autohosts) get immediately rated. If you now upload an old replay, rates will be calculated as if the match was made today. This can only be corrected by running a full calculation on all 4000+ replays. The process is only semi-automatic and takes ~40min., so I won't do it every day. But with a good reason (like you having uploaded >10 replays from old tourneys) I can do it. Just PM me.

dansan · Post by **dansan** » 05 Oct 2012, 05:40

I forgot to post this table of the inital rating run: http://spring1.admin-box.com/downloads/ ... 27.csv.zip
Each line represents a match and its players ratings after it. The link to the replay is on the far right side.
I think it's much more usefull than that history page...

That Glicko rating seems funny, indeed. This ones (8 wins, 0 loss) too: http://replays.springrts.com/player/232063/ I wonder if it has to do with not using rating periods.
I'll take a look at the numbers next week. I'm gratefull for help from anyone with rating experience.

Silentwings · Post by **Silentwings** » 05 Oct 2012, 10:10

VERY nice work.

I have but one request, although much awesomeness is possible for using this in balance algorithms one day - in the ranking tables could you add a column to show how many games are being used to calculate each players rating?

It might make sense not to show players in the ranking table if they have, say, <30 games played or are inactive, say, haven't played a game for 3 months.

[I noticed that the top player on team rankings (GG) has won ~75% of his team games, with over 200 played, but the second player (podger) has won 100% with only two games played =p]

Also, there is something slightly odd going on because GG appears at places 1 and 4, with different names, but when you look at the rating page for those names it seems to claim that they are linked but attaches different stats to each of them.

Silentwings · Post by **Silentwings** » 05 Oct 2012, 10:18

One other thing (I had a long read of your stats =p). 'Team' currently includes anything from 2v2-8v8. But smaller games are generally much shorter and this means players who play 2v2-4v4 alot have their team rating determined almost entirely these games, with too few 5v5+ to really make a difference.

For purposes of one day using this in balance algos, I think it would be a big plus if 'team' was broken down e.g. into two sub-categories, I'd go for 'small team' covering 2v2-4v4 and 'big team' covering 5v5-8v8.

about your Elo question: iirc the sensitivity to recent games in the ELO system is controlled by the K-factor. If you are seeing low variation only after a large number of games then that can be helped by increasing the value used for K. If you have low variation even with very few games, or low variation only in particular parts of the spectrum of players, then it's a harder issue but the K-factor is still probably the right way to solve it. In this case a staggered K-factor (i.e. in the update step, a players K-factor depends on their current rating) is probably the way to do it.

Post by **hoijui** » 05 Oct 2012, 11:35

niiice!

good work!

Silentwings · Post by **Silentwings** » 05 Oct 2012, 13:21

Paste (muckl on forums) is happy to set up automatic uploads from TERA, dansan, but he needs to know how to do it. Could you make a forum post or a wiki entry or something like that for how to do it? I think that timezones means you are rarely online at same time.

bibim · Post by **bibim** » 05 Oct 2012, 22:41

I posted the short howto for automatic replay upload when I announced the functionality in SPADS:
http://springrts.com/phpbb/viewtopic.ph ... 30#p525430

Good job Dansan btw :)
I'm indeed working on something similar integrated in SPADS, but I'm not sure at all about when I finish it, as I'm lacking some motivation atm...

dansan · Post by **dansan** » 07 Oct 2012, 22:19

Back from vacations. Thanky ou for your comments! :)

Silentwings wrote:in the ranking tables could you add a column to show how many games are being used to calculate each players rating?
It might make sense not to show players in the ranking table if they have, say, <30 games played or are inactive, say, haven't played a game for 3 months.
[I noticed that the top player on team rankings (GG) has won ~75% of his team games, with over 200 played, but the second player (podger) has won 100% with only two games played =p]

lol - yes - makes a lot of sense to make a cut there somewhere :)

Silentwings wrote:Also, there is something slightly odd going on because GG appears at places 1 and 4, with different names, but when you look at the rating page for those names it seems to claim that they are linked but attaches different stats to each of them.

Dansan wrote:The site urgently needs support for proper smurf/account joining!

Yeah... Pizdo made it into 1v1, Team and FFA with 2 separate accounts! When I created the sites model some months ago I didn't think of proper "account-unification" or the like, but only to make smurf/multiple accounts visible. The result is a broken dataset of multiple, spearate accounts with just unified names :( This definitively needs some redesigning.

Silentwings wrote:Paste (muckl on forums) is happy to set up automatic uploads from TERA, dansan, but he needs to know how to do it.[..]

Thank you - I'll send him a PM.

Silentwings wrote:One other thing (I had a long read of your stats =p). 'Team' currently includes anything from 2v2-8v8. But smaller games are generally much shorter and this means players who play 2v2-4v4 alot have their team rating determined almost entirely these games, with too few 5v5+ to really make a difference.

For purposes of one day using this in balance algos, I think it would be a big plus if 'team' was broken down e.g. into two sub-categories, I'd go for 'small team' covering 2v2-4v4 and 'big team' covering 5v5-8v8.

+1

Silentwings wrote:about your Elo question: iirc the sensitivity to recent games in the ELO system is controlled by the K-factor. If you are seeing low variation only after a large number of games then that can be helped by increasing the value used for K. If you have low variation even with very few games, or low variation only in particular parts of the spectrum of players, then it's a harder issue but the K-factor is still probably the right way to solve it. In this case a staggered K-factor (i.e. in the update step, a players K-factor depends on their current rating) is probably the way to do it.

Griffith said he used 30 for the BA 1v1 tourneys. I'll start a recaluculation tomorrow with that. I think in chess they lower the factor the more games someone has played... we'll see what comes out of the recalc...

Beherith · Post by **Beherith** » 07 Oct 2012, 22:57

Split topic. Thanks again!

Griffith · Post by **Griffith** » 08 Oct 2012, 13:10

Yeah in chess they use that:

-K = 30 for a player new to the rating list until s/he has completed events with a total of at least 30 games.
-K = 15 as long as a player's rating remains under 2400.
-K = 10 once a player's published rating has reached 2400, and s/he has also completed events with a total of at least 30 games. Thereafter it remains permanently at 10.

So FIDE (World Chess Federation) adapt the K factor on the number of games played and also the rating achieved.

danil_kalina · Post by **danil_kalina** » 08 Oct 2012, 13:13

But you can't play chess multiplayer

Beherith · Post by **Beherith** » 08 Oct 2012, 13:20

Which is why non 1v1 rankings are based on TrueSkill.

dansan · Post by **dansan** » 08 Oct 2012, 23:36

k-factor=30 didn't change much, because there is clearly a bug in the calculations. After staring a bit at the numbers I found some errors in the results.
Debugging it, will post an update when fixed.

dansan · Post by **dansan** » 09 Oct 2012, 13:16

fixed - new Elo values are there
filtering out players with <30 matches, and account-unification are todos

danil_kalina · Post by **danil_kalina** » 09 Oct 2012, 15:55

Beherith wrote:Which is why non 1v1 rankings are based on TrueSkill.

I am new in this topic. Haven't noticed other rankings

albator · Post by **albator** » 22 Oct 2012, 00:07

You need to do somthing so prevent this kind of fagotery:

http://replays.springrts.com/replay/896 ... 779b8f899/

Player losing against their own other account to increase their stats

BaNa · Post by **BaNa** » 22 Oct 2012, 00:29

albator wrote:You need to do somthing so prevent this kind of fagotery:

http://replays.springrts.com/replay/896 ... 779b8f899/

Player losing against their own other account to increase their stats

lol I think having a wall of shame would be enough :D

Post by **gajop** » 22 Oct 2012, 01:02

albator wrote:You need to do somthing so prevent this kind of fagotery:

http://replays.springrts.com/replay/896 ... 779b8f899/

Player losing against their own other account to increase their stats

Usually solved by either of:
1) Matchmaking system in big enough systems, which usually results in people not being able to pick their opponents, so "fixed" games are less likely to happen.
2) Banning multiple accounts used in competitive play.
etc.

tzaeru · Post by **tzaeru** » 22 Oct 2012, 08:04

You should note that Elo, Glicko etc rankings are not supposed to be universal meters of skill in open games.

These systems were originally made to rank chess players and such who played in hundreds of tournaments during their lifetimes, but only so often against specific players. In these tournaments, the seeding was supposed to also be carefully done to prevent cherry picking games etc.

The systems can and will be gamed, and sometimes come to less than satisfying conclusions in ranking players. Still, they are very interesting numbers to see and probably would do better with balancing than time-based rank alone.

dansan · Post by **dansan** » 22 Oct 2012, 11:59

That is not a problem: assume account A is stronger and B weaker, the next time they play against each other:
If A is the winner it'll gain less points and B will loose less points.
If B is the winner, then B will be rewarded strongly and A will loose strongly, making both accounts "more medium".

There is no simple way to manipulate, because strong players must play against strong players to gain points. The effect is noticeable if both players are at least 100 points apart.

Another thing that is done in Elo, is lowering the k-factor for players with lots of matches. Then matches they play don't weight (change points) so much - indicating a well-known strength. This is not done atm, because I didn't have the time (and I wanted to observe a little first), but will be implemented in the future.

Last but not least I recently built in multi-account-handling. Accounts that were "united" count as one. From then on only the rating of the oldest account will be used. I just added a check for that into the rating function. If two accounts of the same player are in a match, none of them will get any rating for that match. It'll be, as if they didn't play.

But: I'm bad at knowing all those accounts... I need help uniting accounts. I'll add a function to the site to allow users to flag players as smurfs. For now: if you wish accounts to be treated as one: just make a comment at the bottom of a replay.

Anyway: Because of what I wrote in the 1st paragraph, it is really a lot of work to get a better rating for a player.

Spring RTS Engine

Elo, Glicko and Trueskill ratings on replays.springrts.com

Elo, Glicko and Trueskill ratings on replays.springrts.com

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: Elo, Glicko and Trueskill ratings on replays.springrts.c

Re: Elo, Glicko and Trueskill ratings on replays.springrts.c

Re: Elo, Glicko and Trueskill ratings on replays.springrts.c

Re: Elo, Glicko and Trueskill ratings on replays.springrts.c

Re: Elo, Glicko and Trueskill ratings on replays.springrts.c

Re: Elo, Glicko and Trueskill ratings on replays.springrts.c

Re: Elo, Glicko and Trueskill ratings on replays.springrts.c

Re: Elo, Glicko and Trueskill ratings on replays.springrts.c

Re: Elo, Glicko and Trueskill ratings on replays.springrts.c

Re: Elo, Glicko and Trueskill ratings on replays.springrts.c

Re: Elo, Glicko and Trueskill ratings on replays.springrts.c

Re: Elo, Glicko and Trueskill ratings on replays.springrts.c