0000585: Change of networking model for more reliability (willing to help code this)

View Issue Details [ Jump to Notes ]

[ Issue History ] [ Print ]

Project

Category

View Status

Date Submitted

Last Update

0000585

Spring engine

General

public

2007-08-12 10:11

2011-08-25 03:39

Reporter

genixpro

Assigned To

abma

Priority

normal

Severity

feature

Reproducibility

always

Status

resolved

Resolution

no change required

Product Version

Target Version

Fixed in Version

Summary

0000585: Change of networking model for more reliability (willing to help code this)

Description

I think the big problem with Spring in real-life is that its networking model is simply too complex. As far as I can tell (I haven't looked at the code), Spring sends out the order across the network, adjusting for latency. If that order takes longer than expected, Spring simply continues the game. When the order does arrive, it seems to reverse the game state and re-apply with the order. This causes noticeable stuttering when network conditions are anything less than optimal.

Spring also seems to handle more special cases which cause other symptoms, like when the game will freeze, objects continue on their original trajectories, often off the edge of the map. Observing the end-user behavior seems to indicate that Spring has a fairly complex and involved system, because it behaves in a wide variety of manners in different situations.

I think Spring would greatly benifet from getting rid of all that complex machinery. I am a C++ coder, and while I know nothing of Spring code, I am willing to help. I have created the new networking engine for Globulation 2 Beta 1 (came after Alpha 23), and have found that our new system is extremely reliable in its own simplicity. I like spring as much as i like glob2, so i'm willing to take from my full time glob2 development and donate to spring.

Also, I'm not sure how close Spring follows this model. Maybe it already uses it.

Basically, it goes like this:
    1) Each client connect to each other (or through the host). The connections has TCP like properties, all packets are guaranteed to arrive and in order. The actual implementation can be some fine tuned UDP code, although Glob2 uses TCP and it works very well.
    2) For every network step, any number of orders can be processed. Also on the network step, orders are sent. Any number of orders are sent. If there are no orders for that step, a message is sent to all players saying that no orders where given.
        2a) When a player issues an order, it is queued for a certain network step ahead of time. This network step is chosen due to latency. In Glob2, we keep latency fixed for the entire game, as it makes the whole system much simpler. However, clever and slow fine tuning can make latency adjustable.
        2b) When the player issues an order, it is queued to be sent across the network at the next network step, and scheduled for a fixed network step ahead of that one. When its time to send the message with the orders, there may have been none (in fact most of the time there aren't any), so a message indicating none is sent
        2c) The network steps aren't necessarily logic or graphic steps. The network frame rate, so to speak, is probably very low compared to the rates of the game logic or game graphics. Even the fastest clickers can't issue any more than 5 orders per second (although going for more doesn't really hurt), and most of the time what they are doing is "attack attack attack click click click!!!"
    3) A client only advances a step if it has all of the messages from all other players for that step. It can then safely assume that the circumstances will *not* change, and continue the game until the next network step.
        3a) When the client doesn't have all the orders, it pauses. Its possible to show a "waiting for player x" message when this occurs. Other clients may have received all the orders on time, advanced a network step, and sent more orders while this one is frozen. This is OK, thanks to the TCP properties of the connection, the order will get to this client, and usually within 100 ms, so the player never notices.
        3b) In globulation 2, network rate, game rate and graphic rate are all derived from the same counter, a step every 40 ms causes a redraw, execution of the game logic and a network step every 5 normal steps. However, in spring its more complex, likely each of these would be separated. Thus, it would best that each client sent with each network packet what its own performance is, so that spring can better slow down the system to make it less jittery. However, an important property is that with our without this adjustment, the network will still fully function, if one computer is going slow, it will simply send packets slower. Other clients will receive slower, pause more and as such send packets slower themselves. The whole game remains stable, just a bit slower than usual.
        3c) In general, the game shouldn't try to dead-reckon for any long pauses. Units in spring move *allot* and this would be equivalent to the stuttering we already get. Its better to pause and display a "waiting for" message
    4) Players lost don't destroy the game. Its the hosts (computer, not the person) job of determining when a player is lost. If the host computer decides a player is lost, it sends the message to all other players. An order from that player is no longer expected, and the network step advances as normal. In a more complicated P2P system, its possible to make each client decide for itself when another player is lost, but its very important that all clients are synced about who is lost and who isn't.
    5) Its *very* critical that the human players know why the game is pausing. Who is at fault, and provide a count-down until the player is considered lost. The game pausing is the *only* side effect of even the poorest network performance, so the game is much more predictable. Also, I find that this system in practice is very reliable, so there will be less pausing in general. Its the same system that Age Of Empires and Starcraft use, both of which are acclaimed with excellent online play.

Tags

No tags attached.

Checked infolog.txt for Errors

Attached Files

Relationships

Relationships

Notes
~0001082 KDR_11k (reporter) 2007-08-12 12:55	Sounds like Spring's system except it doesn't pause but extrapolate, i.e. everything keeps moving at its current speed to keep the image smoother, the result is still the same.

~0001083 tvo (reporter) 2007-08-12 13:53	The design in Spring is as follows currently: The network code behaves like TCP on UDP, ie. the network layer guarantees all packets always arrive on the other side in order. The user interface does not modify anything in the simulation directly, instead it sends network packets containing the selection and the commands to apply to it to the host. The host immediately forwards commands to all players, thereby basically determining the order in which commands are executed from the ping to the client sending the order (the commands of someone with 200 ms ping will be inserted in the outgoing network stream of the game server 200 ms after the client sent them, the commands of someone with 500 ms ping will be inserted 500 ms after the client sent them). In the mean time the server keep tracks of the game time and the game speed. The game speed is calculated based on the % of wallclock time spent in CGame::SimFrame(): this value is sent by clients to the server at regular intervals. If, based on this speed calculation and the current time, it is time for a new frame, the server inserts a NETMSG_NEWFRAME into the network stream. Only when this message is received client side, CGame::SimFrame() is executed. Because the server broadcasts to all clients exactly the same network stream (and the UI can not directly manipulate the simulation), this guarantees clients keep in sync. (even though commands of clients with less lag are processed earlier then commands of clients with much lag.) When someone is "lagging out", the lowest layer netcode of course starts resending etc. In the mean time the server continues to send NETMSG_NEWFRAME messages to all clients, since it does not take ping or latest receive time into account when calculating game speed (and it didn't receive NETMSG_CPUTIME messages from the client, since it is lagging out). If/when this person gets back into the game, he basically needs to process a lot of network stream at once, which may be one cause for stuttering (everything seems to move really fast for a while then, while it was standing basically still a moment earlier). If the lag was real network lag, and not a CPU hog on the client, the server won't ever care about this (apart from feedback mechanism like the fact that much time is spent in CGame::SimFrame() if this function is called more often). A totally different thing (which may be another cause of stuttering), is that drawing functions do not render stuff at the real position (as kept in the simulation). They render stuff at something like "position + (time since last CGame::SimFrame()) * speed", ie. positions are inter/extrapolated. This is good if everything goes fine and FPS > 30, because stuff moves smoother. If you are dropping out or badly lagging however, it starts extrapolating, and once the simulation is updated it may look like the unit/projectile jumps back a bit. That's it about the current design. From reading e.g. the papers about Age Of Empires network model, I know that e.g. the fact that Spring executes commands as fast as possible is a bad thing, since they (AOE devs) found out that a high but constant lag between giving command and it being executed is much better then a constantly differing lag. Another thing that I know may be improved, is the fact that the GUI only sends the command over the net. Any user feedback only happens after the roundtrip time. It would be much more user friendly if commands that aren't "confirmed" by the server yet would be rendered in the GUI already (and cause unit to play it's affirmative sound, if any). Compared with your points I think that: 1) Spring uses reliable in-order UDP. 2a) Orders are not queued client side in Spring, they are only sent to the server for roundtrip. Once they get back, they are put in the simulation code's command queues. 2b) When the player issues an order, it is immediately put in the network layer's send buffer, which is flushed every update loop iteration if it is not empty (that is, FPS times per second). 2c) There is no such thing as "network steps" in Spring. Network buffers are flushed (if not empty) and recvfrom is called every update loop. The network stream instructs the simulation when a new simulation frame should be calculated. 3) Since the Spring client modifies the simulation only based on the messages in the network stream, and the identical network stream is broadcasted to all clients by the server, and because of (1), the same behaviour is exhibited by Spring, I think. 3a) There's no such thing in Spring, if it doesn't receive orders, it doesn't receive NETMSG_NEWFRAMEs either. So effectively the simulation pauses. Would be good to have user feedback added to this though. 3b) In pseudocode, it works like this in Spring: while (!done) { // main loop net->Receive(); for (all net messages) { if (message == Command) GiveCommand(); if (message == NETMSG_NEWFRAME) SimFrame(); //...etc... } DrawEverything(); ProcessUserInput(); } 3c) Agreed. 4) See (1) and (3). Server inserts a NETMSG_PLAYERLEFT message in the network stream. 5) Agreed. If you want to help that is really cool. Currently Auswaschbar is caring about the network code, though I happen to know how it works too (because I made an attempt to rewrite it which is obsoleted by Auswaschbar's faster rewrite :-)). If it is too big for patches, I could offer you SVN access so you can fix things in a branch so we can merge it in in complete chunks of working code?

~0001087 genixpro (reporter) 2007-08-12 22:19	I read read the same article from the Age of Empires developers, and its partially what inspired my Globulation 2 system. I also read/heard about the star craft system which is essentially the same. From what I read, spring doesn't send anything across the network when no order is issued. This is the key to the stability of the entire system. Also the idea that the host chooses latency adjustment seems to be a bit counterintuitive, i imagine it also magnifies bad network connections. Also that extrapolation is, as you said, for 30 frames per second. Problem is, most games don't get 30 frames per second, or anywhere near it. Thats practically near perfect laboratory conditions. Personally, extrapolation should be entirely stopped in my opinion, or at least a simple std::min to keep it from extrapolating for more than 200 ms or so. I'll see what I can do to get an anonymous SVN just to look at the code and such.

~0001091 tvo (reporter) 2007-08-13 08:19	It doesn't send an explicit "I have no orders for frame NNN" message, no. But there is some stuff that is sent every frame (client sends NETMSG_NEWFRAME back to server as soon as it executed CGame::SimFrame() for example), so that effectively the network buffer is flushed every frame (and the server knows there will be no orders coming for frame X if a NETMSG_NEWFRAME reply for frame X has been received from the client.) I agree about the interpolation, removing it probably wouldn't be noticed by users, especially if a better mechanism for keeping lag constant (not fluctuating) and as a bonus it saves some CPU cycles.

~0001092 tvo (reporter) 2007-08-13 08:21	Oh, but ofcourse commands still don't store a frame number, so it doesn't really matter anyway, apart from Spring needing to take care of commands given by the net code to units that have died between the time the command was given and the network roundtrip.

~0007288 abma (administrator) 2011-08-25 03:39	i see no reason why to do that...improvements are still welcome but imo it works fine as it is. and afaik it mostly works like the initial report described it.

Notes

Date Modified	Username	Field	Change
Issue History
2007-08-12 10:11	genixpro	New Issue
2007-08-12 12:55	KDR_11k	Note Added: 0001082
2007-08-12 13:53	tvo	Note Added: 0001083
2007-08-12 22:19	genixpro	Note Added: 0001087
2007-08-13 08:19	tvo	Note Added: 0001091
2007-08-13 08:21	tvo	Note Added: 0001092
2011-08-25 03:39	abma	Note Added: 0007288
2011-08-25 03:39	abma	Status	new => resolved
2011-08-25 03:39	abma	Resolution	open => no change required
2011-08-25 03:39	abma	Assigned To	=> abma

Issue History