View Issue Details

IDProjectCategoryView StatusLast Update
0004171Spring engineGeneralpublic2013-12-04 21:02
ReporterBeherith Assigned ToKloot  
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
Product Version95.0 
Target Version96.0 
Summary0004171: CTO assert and jittering
DescriptionSpring stutters (unit movement) when playing online, and the stutters coincide with CTO >1 assertion errors. It happens very frequently, often more than once per second.

Silentwings and I have been doing some very extensive testing on how the frame jitter occurs in online games.

The error occurs iff someone is spectating or playing a live game (not catching up to online game, not watching from replay) and he is not the host, he is just a client. Here are our findings:

Spring 95.0 has heavy random jitter, the frame times are printed in the infolog.
Units seem to stop for a frame then jump forward a bit. Fast fourier tranformation of the delta times between two simframes show a minor peak at 6.66hz, but nothing very apparent.

Spring 94.1 has a periodic jitter, where it seems that every 30th frame (but not nessessarily those that where simframe%30==0) is shorter than the others, and it looks like units 'jump' forward a small amount each frame. Especially visible on planes, nano particles, and other smoothly moving units and projectiles. Fast fourier transformation of the delta times between two sim frames (as measured by the attached widget) show a marked peak at 1hz.

Spring 91 is just like spring 94.1, with periodic (about 1hz) jumps forward.

We do not think that this is a new issue, only its nature has changed in 95.0, and I do not think it is the same as GoogleFrog's issue with 15 missed frames and a DT of 500, I'm sure those can happen from time to time, but this is happening every second, making a game that is running at 100+ fps seem very laggy and slow indeed.
Steps To ReproduceJoin a game as a client (online), and watch as units stutter very frequently, and watch as cto assert errors pile up in debug view. Compare it to running spring locally, or just hosting a game.

Maybe even try the dbg_frame_jitter.lua script to see the delta times line up with cto assert errors.
Additional InformationMy ping is pretty OK, 0000179:0000200 ms as measured by spring, on a fat pipe.

Im not running out of GPU ram, it is not some GL call being very slow, I tested extensively with gDEBugger.

I have plenty of ram, not much besides a browser running in the background. Freshly restarted system.

The stutters are directly associated with CTO assertion fails. Each time there is a stutter, I get a CTO assertion fail (I have logflush off).

/luaui disable, /luarules disable and disabling advmapshading and advmodelshading have no effect on the visual stuttering, as does disabling deferred rendering.

Sim CPU use was below 25% during all tests. It happens from game start, not just heavy endgame.

We tested on 3 computers, my i5-750 +gtx560ti + win7 x64, one i3-3217U + win8.1 x64 laptop, and Silentwings's pc, which I believe is similar to my PC (nvidia). It was reproducible on all 3 pcs. I disabled intel speedstep and turbo boost on mine to make sure it was not the cause.

Using high precision timers has no effect on my pc.

Frame time recordings are obviously flawed because of collection from widget:GameFrame(), but all our tests were carried out at at least 100 FPS

To isolate network lag, I tested this between two PCs on a LAN, and the results are identical.
TagsNo tags attached.
Attached Files
dbg_frame_jitter.lua (Attachment missing)
infolog.2.txt (Attachment missing)
infolog.3.txt (Attachment missing)
infolog.4.txt (Attachment missing)
infolog_95.0.1-99_client.txt (Attachment missing)
infolog_95.0.1-99_host.txt (Attachment missing)
infolog_95_112_host.7z (Attachment missing)
infolog_95_112_client.txt (Attachment missing)
infolog_95_115_host.txt (Attachment missing)
infolog_95_115_client.txt (Attachment missing)
infolog_wierd_run1_client.txt (Attachment missing)
infolog_wierd_run1_host.txt (Attachment missing)
infolog_msvc11_client.txt (Attachment missing)
infolog_msvc11_host.txt (Attachment missing)
infolog_client_147.txt (Attachment missing)
infolog_host_147.7z (Attachment missing)
infolog_151_client_dirty_pc.txt (Attachment missing)
infolog_154_client.txt (Attachment missing)
Checked infolog.txt for Errors

Activities

Beherith

2013-11-27 01:32

reporter   ~0012221

Last edited: 2013-11-27 01:40

infolog.2.txt. is my PC (client) with the widget enabled and /debug on to show CTO assert failures
infolog.3.txt is the pc (laptop) that hosted the game, on same LAN as my pc. /debug and widget on.
infolog.4.txt is silentwing's pc (client) with the widget enabled and /debug on to show CTO assert failures

The test game considered of /cheat, /nocost, each player builds 2 labs on dsd and puts them on repeat to build fighters and kbots or vehicles, and we sent the units towards each other. Orders were not spammed at the units. Also tested with just 100 fighters patrolling in a circle, same results. Replays available on request, didnt want to clutter this report further.

All tests were done with BA 7.84, except on engine 91, where 7.76 was used. Map was DSD.

Kloot

2013-11-27 03:00

developer   ~0012223

"The error occurs iff someone is spectating or playing a live game"

You really should have tested with 95+ which logs additional debugging info precisely to find out what the relation to live games is...

Beherith

2013-11-27 11:49

reporter   ~0012224

Last edited: 2013-11-27 11:53

Sure! I attached the results of testing with 95.0.1-99
I tested on my LAN with two pcs, everything described above for 95.0 is applicable to these results. The host PC is still butter smooth, the client is pretty jittery. Need the replay file too? Need help in setting up a hosted test game? Ill help with anything you ask me to.

Kloot

2013-11-27 18:33

developer   ~0012225

Last edited: 2013-11-27 18:46

Ok, please do the following:

1) grab build 112-g5b4d186
2) on the machine you use as host, open springsettings.cfg and add "ServerLogDebugMessages = 1" (this will FLOOD its infolog, so make sure to set LogFlush = 0)
3) run another LAN game (doesn't need to be long, just enough to catch some instances of serious jitter) and then upload the host's infolog

Beherith

2013-11-27 21:59

reporter   ~0012226

Last edited: 2013-11-27 22:00

Done! Thanks :)

Kloot

2013-11-27 23:06

developer   ~0012229

Last edited: 2013-11-27 23:07

Meh... server timings appear to check out, it consistently pushes simframe messages every 30 milliseconds (3 or 4 Update()'s which are spaced 10ms apart). So the problem is (unfortunately) in the network layer.

Beherith

2013-11-27 23:18

reporter   ~0012230

Should I try hosting and playing on the same PC?

Kloot

2013-11-27 23:39

developer   ~0012231

Last edited: 2013-11-28 00:31

Sure, just know that a client on the same machine as the server will still connect over UDP and latency/packet loss should be comparable to LAN conditions (as in none) so while you should do that test for the sake of completeness I don't expect the results to change.

edit: when you get around to it, please use build 115-g65dc8b4

Beherith

2013-11-28 00:59

reporter   ~0012232

Last edited: 2013-11-28 00:59

You are seriously making my day brighter :) . Ok, ill test with 115. Also, if this the error is in the network layer, then is it completely out of our control?

Beherith

2013-11-28 01:55

reporter   ~0012233

Ok tested, with ServerLogDebugMessages and two pc's (didnt have time to try it on one, ping me again if you still feel it is needed). Infologs attached.

Kloot

2013-11-28 02:03

developer   ~0012234

Last edited: 2013-11-28 02:16

Players indeed can not do anything, unless it turns out to be caused by some obscure network config setting whose value got reset in the change from 94 to 95 (unlikely). The fact that there is no jitter in local games also rules out any other config option.

Logs now show definitively that the client packet queue dries up repeatedly.

Beherith

2013-11-28 13:29

reporter   ~0012244

By the way, are you affected at all by this issue when playing/speccing online?

Kloot

2013-11-28 14:16

developer   ~0012245

Last edited: 2013-11-28 14:19

Personally I have never seen this under normal online conditions no, but there are too many variables involved (differences in OS / hardware / routing / ...) to derive much from that.

The biggest WTF is that the low-level netcode (packet reassembly crap and the like) was barely touched between 94 and 95, so atm I am out of ideas.

Beherith

2013-11-28 19:40

reporter   ~0012257

Last edited: 2013-11-28 19:41

Timeouted so retyping:

This morning I tested with my main PC as host, and laptop as client, and I got the CTO assert errors as the host too! But oddly they were almost all NP=1!
The client laptop had the usual NP=0 errors. 95.0 Infologs attached as infolog_wierd_run1_host.txt and infolog_wierd_run1_client.txt.

After that I tested just on the main PC by launching spring.exe, and I was having jitter errors with CTO assert errors without even playing online. Unfortunately I did not save the infolog. I restarted my computer (i usually only reboot weekly) and I could not reproduce this, nor could I reproduce the host spamming CTO errors. Unfortunately after the reboot I couldnt reproduce with 95.0.1-115.

Many others have CTO error spam, even people on linux (BrainDamage, Kixu)

I will keep trying to reproduce this host side error spam with 95-115.

cleanrock

2013-11-29 08:31

reporter   ~0012260

This is probably a long shot but I wonder if the by default spinning threads can cause this by starvation, try if this make any difference:
WorkerThreadSpinTime=0
and perhaps reduce threads if you have "HT cores":
WorkerThreadCount=<less than the number of your "real cores">

Beherith

2013-11-29 09:22

reporter   ~0012261

My default workerthreadcont was -1, default workerthreadspintime was 5.
A setting of (-1, 0), results in the same jitter. I have 4 real cores, no ht cores. A setting of (4,0) also results in jitter.
Setting the above on the host pc (2 core, 4 thread laptop) to (2, 0) also results in jitter.

Kloot

2013-11-29 16:18

developer   ~0012263

Last edited: 2013-11-29 16:29

For completeness, what about (0, 0)?

Pure host-side jittering would imply netcode isn't a factor after all, so like cleanrock I'm considering if there is possible thread priority inversion going on.

Out of curiousity: do MSVS builds (if you can make them) jitter for you?

Beherith

2013-11-29 16:54

reporter   ~0012264

Same with (0,0)

Ill try to build with msvc over the weekend.

Beherith

2013-11-30 19:44

reporter   ~0012277

Last edited: 2013-11-30 19:53

MSVC 11 build: same results.

EDIT: Also, I was able to get the cto assert error while catching up to a replay (a very heavy 1500 unit FFA game).

What can I do to help test more?

Edit: added infolog_msvc*

Beherith

2013-12-02 13:22

reporter   ~0012293

Last edited: 2013-12-02 17:12

Should I test with singlestep maybe?

EDIT: Tested with singlestepping (Alt+o) held down on host. The game runs relatively smoothly (though jitter is still visible), frame times are as expected (30fps sim), and no assert errors!

EDIT2: note https://github.com/spring/spring/blob/develop/rts/Net/GameServer.cpp#L2412
It says that the server should make a few frames in advance, but based on the numbers in the infolog printed by https://github.com/spring/spring/blob/develop/rts/Net/GameServer.cpp#L2427 it doesnt actually create any! simFrameBehind is always at 0, rarely 1.

Kloot

2013-12-02 17:07

developer   ~0012296

Last edited: 2013-12-02 17:13

That bit of code doesn't play a role except when the host client (if there is one) experiences high CPU-lag.

Would you mind testing 145-g71e2e27?

Beherith

2013-12-02 17:24

reporter   ~0012297

Last edited: 2013-12-02 18:38

Tested with 145-same results as usual. Need infolog?

Also, code says:
// Don't create new frames when localClient (:= host) isn't able to process them fast enough.
// Despite this still allow to create a few in advance to not lag other connected clients.

Can it be that the N jK was talking about in this thread ( http://springrts.com/phpbb/viewtopic.php?f=12&t=30652 ) is just too low? And network jitter just makes it worse?


EDIT: tested 146, same results too. Scream if you need infologs.

Kloot

2013-12-02 18:38

developer   ~0012302

From your LAN test results it's more likely just the case that the server can't / doesn't always feed clients at their expected rate because it sleeps too long (sleep has an inherent inaccuracy) between updates and its thread gets shuffled around a lot. By singlestep-spamming (shoveling frames out the door with every keypress so clients have a backlog) you mask this.

For fun & games, set your keyboard repeat rate to a value smaller than 30Hz and you should experience the same jitter with singlestep.

new build: 147-g2bf5bc0

Kloot

2013-12-02 19:01

developer   ~0012303

use 147, not 146

Beherith

2013-12-02 19:33

reporter   ~0012304

Same with 147. Uploaded infologs (was testing on LAN, earlier today I was testing through internet (but in same country so ping= ~3ms).

It happens also when I play on online hosts, my ping is usually a couple of hundred ms. When we tested with silentwings, I recall our pings were about 200ms.

Kloot

2013-12-02 22:18

developer   ~0012309

ok, I need another infolog of your client LAN machine with build 150-g31f8552.

add UDPConnectionLogDebugMessages=1 in springsettings, then do the usual.

Beherith

2013-12-03 15:32

reporter   ~0012316

Last edited: 2013-12-03 21:12

Did it, infolog attached. Same jitter.

EDIT: did one with 154 too, same jitter, attached infolog.

Issue History

Date Modified Username Field Change
2013-11-27 01:27 Beherith New Issue
2013-11-27 01:27 Beherith File Added: dbg_frame_jitter.lua
2013-11-27 01:28 Beherith File Added: infolog.2.txt
2013-11-27 01:29 Beherith File Added: infolog.3.txt
2013-11-27 01:29 Beherith File Added: infolog.4.txt
2013-11-27 01:32 Beherith Note Added: 0012221
2013-11-27 01:39 Beherith Note Edited: 0012221
2013-11-27 01:40 Beherith Note Edited: 0012221
2013-11-27 03:00 Kloot Note Added: 0012223
2013-11-27 11:49 Beherith Note Added: 0012224
2013-11-27 11:50 Beherith File Added: infolog_95.0.1-99_client.txt
2013-11-27 11:51 Beherith File Added: infolog_95.0.1-99_host.txt
2013-11-27 11:53 Beherith Note Edited: 0012224
2013-11-27 18:33 Kloot Note Added: 0012225
2013-11-27 18:46 Kloot Note Edited: 0012225
2013-11-27 21:58 Beherith File Added: infolog_95_112_host.7z
2013-11-27 21:59 Beherith File Added: infolog_95_112_client.txt
2013-11-27 21:59 Beherith Note Added: 0012226
2013-11-27 22:00 Beherith Note Edited: 0012226
2013-11-27 23:06 Kloot Note Added: 0012229
2013-11-27 23:07 Kloot Note Edited: 0012229
2013-11-27 23:18 Beherith Note Added: 0012230
2013-11-27 23:39 Kloot Note Added: 0012231
2013-11-28 00:31 Kloot Note Edited: 0012231
2013-11-28 00:59 Beherith Note Added: 0012232
2013-11-28 00:59 Beherith Note Edited: 0012232
2013-11-28 01:55 Beherith Note Added: 0012233
2013-11-28 01:56 Beherith File Added: infolog_95_115_host.txt
2013-11-28 01:56 Beherith File Added: infolog_95_115_client.txt
2013-11-28 02:03 Kloot Note Added: 0012234
2013-11-28 02:16 Kloot Note Edited: 0012234
2013-11-28 13:29 Beherith Note Added: 0012244
2013-11-28 14:16 Kloot Note Added: 0012245
2013-11-28 14:16 Kloot Note Edited: 0012245
2013-11-28 14:19 Kloot Note Edited: 0012245
2013-11-28 19:31 Beherith File Added: infolog_wierd_run1_client.txt
2013-11-28 19:36 Beherith File Added: infolog_wierd_run1_host.txt
2013-11-28 19:40 Beherith Note Added: 0012257
2013-11-28 19:41 Beherith Note Edited: 0012257
2013-11-28 22:04 abma Target Version => 96.0
2013-11-29 08:31 cleanrock Note Added: 0012260
2013-11-29 09:22 Beherith Note Added: 0012261
2013-11-29 16:18 Kloot Note Added: 0012263
2013-11-29 16:28 Kloot Note Edited: 0012263
2013-11-29 16:29 Kloot Note Edited: 0012263
2013-11-29 16:54 Beherith Note Added: 0012264
2013-11-30 19:44 Beherith Note Added: 0012277
2013-11-30 19:44 Beherith Note Edited: 0012277
2013-11-30 19:52 Beherith File Added: infolog_msvc11_client.txt
2013-11-30 19:53 Beherith Note Edited: 0012277
2013-12-01 18:40 Beherith File Added: infolog_msvc11_host.txt
2013-12-02 13:22 Beherith Note Added: 0012293
2013-12-02 15:58 Beherith Note Edited: 0012293
2013-12-02 16:40 Beherith Note Edited: 0012293
2013-12-02 16:48 Beherith Note Edited: 0012293
2013-12-02 17:07 Kloot Note Added: 0012296
2013-12-02 17:12 Beherith Note Edited: 0012293
2013-12-02 17:13 Kloot Note Edited: 0012296
2013-12-02 17:24 Beherith Note Added: 0012297
2013-12-02 18:38 Kloot Note Added: 0012302
2013-12-02 18:38 Beherith Note Edited: 0012297
2013-12-02 19:01 Kloot Note Added: 0012303
2013-12-02 19:31 Beherith File Added: infolog_client_147.txt
2013-12-02 19:32 Beherith File Added: infolog_host_147.7z
2013-12-02 19:33 Beherith Note Added: 0012304
2013-12-02 22:18 Kloot Note Added: 0012309
2013-12-03 12:03 Beherith File Added: infolog_151_client_dirty_pc.txt
2013-12-03 15:32 Beherith Note Added: 0012316
2013-12-03 21:12 Beherith Note Edited: 0012316
2013-12-03 21:13 Beherith File Added: infolog_154_client.txt
2013-12-04 21:02 Kloot Changeset attached => spring develop 1b5a0d7a
2013-12-04 21:02 Kloot Assigned To => Kloot
2013-12-04 21:02 Kloot Status new => resolved
2013-12-04 21:02 Kloot Resolution open => fixed