An unbelievably long essay I wrote about the fallacy of the traditional Tired model of mathematical balancing. Yes, the author information at the end is a bad joke. Ignore it and concentrate on the points.
tl;drs provided at various points for those of you who have ADD.
The Myth of Mathematical Balancing
Abstract
The issue of mathematical balancing is a hotly contested one in the Spring community, with its adherents fiercely insisting that it is the best way, indeed the only way to achieve true balance. This essay will study the failings of the traditional Tired method of using math balancing on games as a whole, and note the places where a less ambitious formula can be used to supplement, not supplant, conventional balance methods.
Background
The concept of mathematical balancing was originally introduced to the Spring community by Tired. The topic was mentioned in a number of places (
here for instance), and an actual formula was posted
here. Tired also constructed a spreadsheet containing all the unit stats and their resultant costs (
downloadable sample), which was used in Tired Annihilation. This system will hereafter be referred as the Tired model.
Other people have also experimented with mathematical balance. It has been used in the Supreme Annihilation and Supreme Legacy mods, and dabbled with elsewhere; Evil4Zerggin wrote about mathematical balancing
on the CA wiki, and I attempted to construct a new spreadsheet design which attempted to overcome weaknesses in the Tired model, but ultimately proved to be too difficult to create, and was abandoned (hereafter referred to as the KR model) ├óÔé¼ÔÇ£ a sample can be downloaded
here.
The Linear Error
Despite its proponents praising the depth and complexity of the Tired method, it is in reality exceedingly simple mathematics-wise: a number of unit variables (easily less than a third of the ones that matter) and some constants are multiplied together, and the output is used to price the unit. The given equation in the Basic Mod Balance thread is:
Speed%*((Weapon Value% + Health%)/2)*LoS%*Special Value%, where Weapon Value = DPS*Range*RoundedAOE.
The patent nonsense implied by such a formula is immediately obvious to anyone who spends a few moments thinking about it. In particular, range and speed are two attributes whose effects can hardly be described as linear.
Take the CA Janus, for example. It fires twin high-damage missiles with a ten second reload time, outranging most ├óÔé¼┼ôT1├óÔé¼┬Ø units but can be hit by LLTs as it approaches them. It has low health and relies mainly on its mobility to survive. Now double its range (enough to easily outrange HLTs and almost all ├óÔé¼┼ôT2├óÔé¼┬Ø units), and halve its HP to compensate. Such a unit would be nothing short of massively overpowered ├óÔé¼ÔÇ£ yet under the above equation, it would be balanced with only a 25% cost increase!
Nor is this restricted to any particular unit or stat. For instance, would a Flash in BA be balanced if it had 50% less DPS but 50% more speed? Could the KP Byte be balanced by halving its range in order to double its HP? How about giving Gundam├óÔé¼Ôäós Char double range at half DPS? At the other extreme, increasing a unit├óÔé¼Ôäós range may not increase its performance significantly ├óÔé¼ÔÇ£ for instance, an *A nuclear silo can already cover the entirety of all common maps regardless of where it├óÔé¼Ôäós placed, and a BA/CA Big Bertha can similarly reach the opponent├óÔé¼Ôäós base from the player├óÔé¼Ôäós own in most cases. Increasing their range would not impact their performance in a proportional manner.
The KR model attempts to resolve the problem by using exponential terms where appropriate; however, this increases the difficulty of finding the correct formula. In just the sample units tested, for instance, the Stumpy, Janus and Gator are severely undervalued, while the Glaive (Peewee) apparently costs much less than it should. Tweaking value weightage could solve these issues, but at the cost of ruining the valid outputs of other units. This, incidentally, reflects another fallacy of the Tired model ├óÔé¼ÔÇ£ rather than retaining cohesive balance by tweaking all units at once, the spreadsheet method cripples it by ├óÔé¼┼ôfixing├óÔé¼┬Ø that which ain├óÔé¼Ôäót broke.
Conversely, even the most subtle variations in a unit├óÔé¼Ôäós stats can completely unbalance it. At one point, CA ran a standardization script that allowed most units to see further than they could shoot. This allowed unsupported Rockos to spot LLTs (and other targets) on their own, and destroy them without fear of damage. It was considered so OP that the unit├óÔé¼Ôäós range had to be nerfed (but this had other knock-on effects, refer to Balancing in a Vacuum below). Or suppose that an Arm LLT had 50 more range than a Core LLT ├óÔé¼ÔÇ£ the extra range is pretty much meaningless against most units, but it means that you could easily kill a Core LLT by placing an Arm one just outside its range. How much more should this cost? As the CA balance guidelines state:
├óÔé¼┼ôThe use of a unit is not always determined by good stats, but often by the right combination and values. There is a huge difference between just outranging an LLT, and having the same range as an LLT. Try to think more in terms of unit relationships than pure numbers.├óÔé¼┬Ø
tl;dr: Treating nonlinear values as being linear, and ignoring how subtle changes can completely change a unit, is an invitation to fail.
Game Design and Regression to the Mediocre
So, we├óÔé¼Ôäóve established that the Tired model falls apart with unique units and overlooks the small changes with big effects. How does Tired resolve the problem? Simple ├óÔé¼ÔÇ£ dodge it. A not invalid complaint about TiA (and its successor, SA) is the blandness of the unit selection; readily apparent by such changes as the dulling of unit strengths (for instance, the Peewee is now just a short ranged fighter rather than a fast raiding unit) and across-the-board speed reductions, creating a more homogenous unit selection that brings to mind Francis Galton├óÔé¼Ôäós famous phrase (albeit used in a different context): ├óÔé¼┼ôregression to mediocrity├óÔé¼┬Ø.
Tired was even willing to sacrifice his own design principles in order to conform to the formula. The original reasoning behind the massive price increase of the Jeffy and Weasel in Tired Annihilation over BA (>3x) was that the Jeffy is ├óÔé¼┼ôfree information├óÔé¼┬Ø ├óÔé¼ÔÇ£ i.e. it allows the player to gain intel on his opponent for minimal cost. Yet, the same Tired reduced the spybot├óÔé¼Ôäós cost to less than a quarter of what it was (in TiA 6.0 and SA 1.0 it has an adjusted cost of 59, less than two-fifths of a Flash).
In fact, Tired went so far as to declare (admit, rather), that
any unit is balanced if it fits the formula regardless of what the formula actually is.
Consider the following quote:
Tired wrote:For a balanced equation, I could say that speed was worth 10x as much as Health. As such, it would take many times - perhaps 5x - as much metal invested in Flash Tanks as invested in a Sumo to kill a Sumo. In BA, that would equate to something like 10,000 metal in Flash Tanks. This would be balanced.
I could say that range was worth only 2% as much as dps, and redesign a Flea that could fire from one corner of a 16x16 map to an opposing corner and cost 200 metal without changing any other basic unit stats. This would be balanced.
I could say that airplanes should cost 57x as much as comparable ground units, put wings on a Stumpy, and send 11,400 metal PteroStumps to their doom against normally priced Slashers. Would this PteroStump be effective? No. Would it be balanced? Yes.
Thus we see how more balance can actually lead to a less fun game.
tl;dr: Tired circumvented problems with his formula by changing the game, not the formula. And not for the better.
Balancing in a Vacuum
Tired tells us that any stat can have a single arbitrary value assigned to it ├óÔé¼ÔÇ£ but can it?
Even the best spreadsheet will ultimately be balancing in a vacuum, outputting sterile values without reflecting actual unit vs. unit situations which are likely to be encountered in the field. In an actual game, in addition to all the unit stats which are or are not considered by the equation, we have the one completely uncontrollable variable on top of it all ├óÔé¼ÔÇ£ human interaction.
Unit attributes combine to form the unit in a way that far exceeds the mere sum of its stats. In BA, the Flash doesn├óÔé¼Ôäót look so hot compared to the Peewee on paper ├óÔé¼ÔÇ£ it has slightly less HP/cost and half the DPS/cost, easily loses in terms of turn rate and acceleration, and can├óÔé¼Ôäót go up hills like its Kbot counterpart can ├óÔé¼ÔÇ£ yet in an actual combat situation, the Flash is simply superior in every way due to the way it works ├óÔé¼ÔÇ£ its higher per-unit HP allows it to survive long enough to dish out damage, fewer units means Flashes are less prone to crowding each other out than Peewees are, the greater speed means the Flash can close in on traditional enemies such as LLTs faster, and the Flash doesn├óÔé¼Ôäót chain explode with shrapnel like the Peewee does.
As hinted above, different units value different stats; while DPS is generally useful for all combat units, range is much more valuable for artillery and standoff units than HP is, while rushers require more HP to survive. Range is especially important on a building, much more than a mobile unit (as it can├óÔé¼Ôäót move to be able to hit its target), and so forth.
The KR model does attempt to counteract such problems by:
- Using unit categories (though the only one shown in my sample spreadsheet is the active/passive defender distinction) to allow different stat weights for different units;
- Attaching an exponent to the final unit cost (to reflect the advantages of a higher weight class), but this is an imperfect solution;
- Including ├óÔé¼┼ômagic numbers├óÔé¼┬Ø that can account for such differences, but as Saktoth has pointed out in the past, we may as well not have a formula if we resort to such kludges.
Even the number of units coming to an engagement can change the outcome completely. Two AKs can beat one Rocko in BA, simply by running up to it, dodging its rockets, and pew-pewing it to death. Four AKs can likewise beat two Rockos. Six AKs beat three Rockos. But what about 10 AKs vs. five Rockos, or 20 vs. 10? The massive target density simply means the large AK clump cannot hope to dodge the rocket salvo, and many units will be hit and killed (filling the survivors with shrapnel that hurts them ├óÔé¼ÔÇ£ see above Flash-Peewee comparison). The stragglers can easily be picked off by the Rockos. This is something that no formula can account for.
Changing a unit, or a specific class of units, can also have knock-on effects that make other units overpowered or underpowered even though those other units haven├óÔé¼Ôäót been changed themselves. For instance, when CA reduced its Rocko range, riot units such as the Outlaw that were previously countered by it became too powerful against it, and had to have their own range nerfed as well. However, that in turn meant they couldn├óÔé¼Ôäót fight raiders as effectively, and those too had to be rebalanced.
Even if you could design a formula that compares not only the units to a benchmark, but to every other unit, adding unit mixing to the equation only screws it up further. When a Slasher line is attacking your LLTs, you can run your Peewees/Glaives out and whack them. But wait, what if they├óÔé¼Ôäóre screened by Levelers? Your raiders would simply get slaughtered, and you have to come up with something else. What if instead of Slashers, it├óÔé¼Ôäós the nearly identical Samson? Arm doesn├óÔé¼Ôäót have Levelers. How do you balance the Peewee to fit this problem?
You may now be asking ├óÔé¼┼ôBut if a computer can├óÔé¼Ôäót perform such complex comparisons, what can a human do?├óÔé¼┬Ø It is a fallacy to assume that computers are superior in dealing with every possible mathematical problem; otherwise AIs would be routinely beating humans at Go just like they do in chess. Among other things, computers cannot properly handle integration (antidifferentiation), and where humans can use analytical methods to arrive at the exact value for an integral, computers are forced to use more primitive numerical methods to arrive at the solution (See:
one two). Similarly, while a human may not be able to intricately compare hundreds of thousands of unit combinations, he/she can work out the big picture and actually get a better result, by not missing the forest for the trees.
If you told a biologist that you could use such simplistic mathematical models to simulate a real-life ecosystem, he would laugh in your face. Yet that├óÔé¼Ôäós exactly what advocates of the Tired formula claim to be able to do.
tl;dr: Units do not exist in a vacuum and cannot be balanced as such.
Understanding our Limitations
So, if the spreadsheet approach doesn├óÔé¼Ôäót work, does that mean that mathematical balance is useless? Not quite. Every time we compare two unit├óÔé¼Ôäós stats, we are performing balance comparisons using ad hoc mathematics.
There are simple formulae which, by virtue of not overestimating their capability, can give better results than the universal spreadsheet. Evil4Zerggin├óÔé¼Ôäós
Basic Efficiency Models are a good example ├óÔé¼ÔÇ£ such methods can be use to evaluate a small number of units in a specific area without overextending the results to other aspects. Rather than pretending to factor in every single unit attribute, we take a few, acknowledge the contribution of the ones we don├óÔé¼Ôäót, and then run tests to determine the extent of their effect.
tl;dr: Simpler math can work better, by doing it right.
Conclusion
Despite claims to the contrary, the Tired model is nowhere near as complex as it wants to be, or needs to be. Any attempt to create a more complicated formula runs into the difficulty of finding appropriate values that can be applied uniformly to all units. Nevertheless, simpler models applied in a more specific manner can act as a useful tool to identify potential imbalances; combined with testing, this remains the best way to balance a Spring game.
And finally, to quote Tired once more:
Tired wrote:Balance out of context means nothing.
Final tl;dr: There are some things math can balance; for everything else, there├óÔé¼Ôäós playtesting.
KingRaptor is a senior Complete Annihilation developer and an expert on Spring game design. He is the author of such bestselling books as The Peet Delusion
and Smoth vs. Argh: A Tale of Forum Drama.