chrank and the constant failure of balance

zwzsg · Post by **zwzsg** » 25 Sep 2012, 22:51

tzaeru wrote:light standard protections like not counting games that lasted under 10 minutes..

Not 10 minutes. 10 minutes is a long time for a KP game. So have a much smaller time, like 10s, just to check the game started well.

dansan · Post by **dansan** » 25 Sep 2012, 23:10

You can have access to 3700 replays with players and winners already saved to DB. The DB layout is on github if you want to look it up, or you can tell me what you want to know, and I can do a query for you.

Out of curiosity I was going to do some statistics on games and players anyway, but I'm very busy atm, so I didn't do it...

... now that I think about it I think it's easy... I can code a little page together that you can tell you for a given player how many times he won and lost.... tomorrow... You can use that to seed your DB :)
I think it should differentiate between 1v1|2v2 and team games. So we can see if the old sayin' that good 1v1ers also play well in teams is true or not :)

Johannes · Post by **Johannes** » 26 Sep 2012, 23:10

Beherith wrote:
good metric = how often do people feel frustrated
What. The. Fuck.

How do you even measure level of frustration? Number of frowny faces in chat? Capture their webcam and do emotion detection?

Edit:

The only good metric is that the win rate of every single player converges to 0.5

I already said there isn't a way to exactly measure it, this isn't natural science. What players feel is the whole point! Maybe calling it "balance" isn't the best term, but whatever. Still the general amount of expressed malcontent at the teaming up does tell a lot.
We shouldn't solely look at one metric of how teams are (ie win rates of people), but think a bit about other things that affect how much fun people have. Can they play with friends, for example. Winning or losing in an 8v8 doesn't mean too much to most people I think - as long as they get to play the game instead of totally steamrolling or being rolled.
Even in a smaller game where you really have a full locus of control over the outcome, you might go 5-0 with the same teams but still have a good time.

What I'd think is most derisive to enjoyment is too wide scale of player skill in a game - if you could separate the total beginners and good players, games would feel less random and more rewarding. ELO system would be better for smurf avoidance in that though.
But maybe there just isn't enough people for that now, and everyone who wants a mass game must play together.

albator · Post by **albator** » 27 Sep 2012, 01:03

Actaully, a good metric is that all poeple get frustrated the same way, casue 50% lose is still frustrated or not.... but that is not really the point anyway I guess.

1) player need to be referenced by account ID not by name

2) once 1) is done, in order to make a good team based ranking, you need to measure how much a player is useful in the game, and not only in late game, that is why a rating based on damage dealt / recieved ranking but also on the position in the damage dealth is quite OK.

State of art would be to get a database feed for each map, and with every X vs X combination for each map with the best damage dealth available. Then, you could make evoluate your database to rank all the players of the communauty, and not getting a high rank just cause you play between noob. Same the otherway around for "PRO" game

Damage death over damage recieved also is interesting.

Of course all of that should, ideally, beeing damped by unit that has been given to the player, ressources, "take", using usual 1M =60 E ratio, converted into hp of damage death using an averge well-used-unit like a t1 tank.

Algorithm does not looks like to be difficult to do or code, but retriving the data might be a big amount of work

Also the 3 coefficient that damp:
- the total damage dealt at the end of the game / (reference best damage death for data base)
- integration of postion in damage damage death over time / (total time)
- integration of [damage death] / [damage received] / (total time) over time
should be tweaked a bit so the sum gives a relevant value for each game.

Then use this value to modify the rank of the player. Plenty of way to do that using filter like:
- best of a certain percentage of game
- last 1 year data
- ...
Plenty of way to modify the rank:
- arithmetic average
- ELO-like
....

3) to prevent trolling, poeple losing on purpose, use filter (above): a cut off over 50% (or whatever) best result might be relevant enough

dansan · Post by **dansan** » 27 Sep 2012, 01:21

I made a little page that shows win/loss ratios of a player:
http://replays.admin-box.com/win_loss/<accountid>/

ex: Pizdo, Droid, tzaeru, danchan, RickHustle, Fight, Zorro, Esorp

albator · Post by **albator** » 27 Sep 2012, 01:41

Not relevant for FFA player, the reason why I think win / lose ratio is a bad ranking idea

Johannes · Post by **Johannes** » 27 Sep 2012, 02:23

Using anything else than win/loss can alter the way you play the game though.

Especially at the point where a game's been effectively won/lost already but not conceded, people who care about rank will do the thing it takes to best affect their rank, instead of just finishing the game.

tzaeru · Post by **tzaeru** » 27 Sep 2012, 04:14

Raising folk's rank according to damage dealt or lowering according to damage received can be extremely inaccurate.

For example, if at 15th minute you bomb your enemies and gain most damage dealt, but the only reason you survived to begin with was that your allies protected you from stumpy raids, you are not necessarily any better player than anyone else in the team.

Or, if some bored vet just decides to go afk while spamming a bunch of aks at someone's heavy porc, the defender might easily have most damage dealt, despite not really having done anything to earn it.

Howerer, there is also a fundamental problem with balancing according to win/loss. It might not work very well for individual matches, since it has attempted to make earlier games even according to how often a player has won or lost - So, assume 8 newbies play vs each other. It shifts the teams around, until each player has won approximately 50% matches.. Now, the 8 new players go to play vs. 8 experienced players, who likewise have been shuffled around so that each of them has won approximately 50% of matches. Balance done there, would be utterly random.

But regardless I do think it's a better combination than time-based rank solely. I'm pretty sure that the best route is combining time-based rank with win/loss-based rank modifier and, of course, admin support.

Rumpelstiltskin · Post by **Rumpelstiltskin** » 27 Sep 2012, 05:42

Elo does not work well when you subdue every played game to it.
There should be 2 types of games.
Ranked and unranked.
If all matches are ranked like it is for ZK ATM, than elo fluctuates too much.
Some players sometimes troll in matches, some just dont play all matches to win..
Elo is good when in every game the players are playing with only winning in mind.
When This is not the case the system is compromised and ELo fluctuates too much.

If you want to have ELO for all games you must mitigate it's effect.
In such a case when using the ELO system to determine how much ELO to add to X player per win or lose you must average the players ELO for the last say, 10 games, than based on this ELO add or substract based on the outcome of the current game.
This will focus more on the trend of a player and will mitigate games when the player was just slacking off or trolling or not playing very seriously.

You can have 2 rankings as well.
1, is the ELO system used now but it would only be for the ranked games.
2, A more mitigated ELO ranking would that be hidden and used for team game balance on unranked hosts.
Players who will be playing on ranked servers will be playing these matches to win and will play onunranked hosts when they just want to have fun and jerk around.

Not to mention that the current always ranked system makes people make smurfs for team games...

dansan · Post by **dansan** » 27 Sep 2012, 09:51

I added FFA.

Numbers sum up only 90%, because not all replays were properly tagged.

I also don't have the time and knowledge to maintain a proper smurf-db... anyway... those numbers are not meant as an argument, but to help the discussion (which i consider important).

Beherith · Post by **Beherith** » 27 Sep 2012, 11:22

Dansan that site is pretty damn cool, I like the stats a lot, it can be a very good starting point to calculate some rankings and see how they stack up:)

very_bad_soldier · Post by **very_bad_soldier** » 27 Sep 2012, 12:59

One problem might be the randomness of team games:
What do you think how many 8vs8DSD games someone needs to play to get a meanigful indication about the individual player's skill by win/loss-ratio?

The other side:
In 1v1 often noobs play against each other and often pros play each other. But you rarely see noob vs pro cause its unfun for both sides.
So if someone has a w/l-ratio of 20:4 it does not have to mean a thing either.

albator · Post by **albator** » 28 Sep 2012, 00:16

Johannes wrote:Using anything else than win/loss can alter the way you play the game though.

Especially at the point where a game's been effectively won/lost already but not conceded, people who care about rank will do the thing it takes to best affect their rank, instead of just finishing the game.

I completly agree. and that is definitly the reason why balance algorith must be done such as it encorage player to do the right thing: lot of damage, as early in the game as possible, with the ressource you have ...

very_bad_soldier wrote:One problem might be the randomness of team games:
What do you think how many 8vs8DSD games someone needs to play to get a meanigful indication about the individual player's skill by win/loss-ratio?

The other side:
In 1v1 often noobs play against each other and often pros play each other. But you rarely see noob vs pro cause its unfun for both sides.
So if someone has a w/l-ratio of 20:4 it does not have to mean a thing either.

Exactly the reason why I propose to scaled the ranking using a data base per map per X vs X config of best damage dealth curve.

Silentwings · Post by **Silentwings** » 28 Sep 2012, 10:09

...the reason why balance algorith must be done such as it encorage player to do the right thing: lot of damage, as early in the game as possible

No. There is no 'right thing' and we should not discourage variety in the way people play. I completely agree with Johannes that a ranking system can only take win/loss as it's input.

One problem might be the randomness of team games:
What do you think how many 8vs8DSD games someone needs to play to get a meanigful indication about the individual player's skill by win/loss-ratio?

Alot, so in 8v8 rankings are likely to be less accurate. But the errors are ironed out to some degree because when you pick teams of 8 players v 8 players it's like that someone overrated balances out someone underrated.

There should be 2 types of games.
Ranked and unranked.

Might work for 1v1 but not useful for large games because in practice you join whichever autohost already has players in.

In such a case when using the ELO system to determine how much ELO to add to X player per win or lose you must average the players ELO for the last say, 10 games...

Do you have any idea how ELO actually works??! Doing that makes no sense at all. Read http://en.wikipedia.org/wiki/Elo_rating_system#Theory.

albator · Post by **albator** » 28 Sep 2012, 10:41

Silentwings wrote: Quote:
...the reason why balance algorith must be done such as it encorage player to do the right thing: lot of damage, as early in the game as possible

No. There is no 'right thing' and we should not discourage variety in the way people play. I completely agree with Johannes that a ranking system can only take win/loss as it's input.

That is just not going to make it for team game. Any kind of algorithm using only W/L as input will need tremendous amount of data to give OK-ish result and still you will get huge uncertainties.

A comparison for you to easily understand: It is like trying to rates what are the best meals in a restaurant by asking to every customer to give a global note to the restaurant when they go out of it. Even if one of the meal is absolutely amazing, you will need to wait until every single combination of meal have been try to be 100% sure to find which one it is. Same the other way around for the worse meal.
If you have N Meal, you will need to get N**2 feed back.
-> If you have 8 spring players in a game , you need 64 games to get only one relevant sample...

You thus need to rate the individual within the game which i precisely what I describe above.

Wanted to use W/L ratio is like wanted to use 3D inverse fourrier transformation to find which part of the cake is softer while you just need to ask to people who are eating it....

Silentwings · Post by **Silentwings** » 28 Sep 2012, 12:23

Ranking systems based entirely on win/loss statistics have already been implemented for team games, with success.

A quick read through danchans statistics will tell you that it takes about 100 team games to get a reasonable estimate of a single players strength through win/loss stats. This is more than practicle, especially since we have data going back >1 year already.

And that's not even accounting for the point I made above about errors often cancelling when you consider 8 players together as a team.

(Although smth additional needs to be done to take account of smurf detection and new players.)

Your 'comparison for you to easily understand' has an insulting name and several other problems:

(1) There is no need (and no option) to play all possible permutations of some 16 players into 2 teams to estimate a single players strength.
(2) It's beside the point, but your method of calculating the number of permutations is wrong. Splitting 16 players between two teams of 8 gives 12870 permutations. 8 players into 2 teams of 4 has 70 permutations. You need binomial coefficients to do this, not powers of 2.
(3) When you say 'one relevent sample' - this doesn't mean anything. A single sample of one game contains information and is a 'relevent sample'. If what you're looking for is the point at which you've eshausted all possible samples and have perfect data, that will take infinitely long because game win/loss results are not a deterministic consequence of the players.
(4) I vaguely remember that numerical Fourier inversion does not get significantly harder as dimension increases.
(5) The real point here is not numerical - its that we should not be telling people to play the game in some particular way that you (or anyone else) happens to like. The aim is to win and beyond that we should only encourage variety.

Beherith · Post by **Beherith** » 28 Sep 2012, 14:17

Silentwings wrote:
(5) The real point here is not numerical - its that we should not be telling people to play the game in some particular way that you (or anyone else) happens to like. The aim is to win and beyond that we should only encourage variety.

This. If people want to make huge eco (of course they do!) then we should not stop them enjoying themselves.

Johannes · Post by **Johannes** » 28 Sep 2012, 14:59

Also I don't know if it'd actually stop anything - ok, you track this or that ingame stat, and then make teams where you've got a wide spectrum of said stat. Now if you always make pure eco for 25 mins at start, you get assigned with people who tend to deal a lot of damage early - I don't think the techer will mind, if anything it's encouragement to deal even less damage to get more aggressive teammates to protect you.

very_bad_soldier · Post by **very_bad_soldier** » 28 Sep 2012, 16:22

Silentwings wrote: A quick read through danchans statistics will tell you that it takes about 100 team games to get a reasonable estimate of a single players strength through win/loss stats. This is more than practicle, especially since we have data going back >1 year already.

One of the most important parts of a new ranking system is to be able to detect smurfs in a reasonable amount of time. I dont think 100 games is viable in that matter.

I would vote for damage dealt as the used metric. While I agree it would have same flaws it would IMO still be the best metric we can get with a minimal amount of work or overcomplex algorithms.

Also I think it would not change the game too much since I assume having lots of damage dealt corresponds well with winning the game. Even an Eco-player will have to build units and do damage at some point. Otherwise he is a bad player.

very_bad_soldier · Post by **very_bad_soldier** » 28 Sep 2012, 16:25

Johannes wrote:Now if you always make pure eco for 25 mins at start, you get assigned with people who tend to deal a lot of damage early - I don't think the techer will mind, if anything it's encouragement to deal even less damage to get more aggressive teammates to protect you.

It does not matter if he minds or not. But if he is not able to convert his eco into actual damage then he is a bad player and deserves damage-dealing teammates.

Spring RTS Engine

chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance

Re: chrank and the constant failure of balance