View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0004171 | Spring engine | General | public | 2013-11-27 01:27 | 2013-12-04 21:02 |
| Reporter | Beherith | Assigned To | Kloot | ||
| Priority | normal | Severity | major | Reproducibility | always |
| Status | resolved | Resolution | fixed | ||
| Product Version | 95.0 | ||||
| Target Version | 96.0 | ||||
| Summary | 0004171: CTO assert and jittering | ||||
| Description | Spring stutters (unit movement) when playing online, and the stutters coincide with CTO >1 assertion errors. It happens very frequently, often more than once per second. Silentwings and I have been doing some very extensive testing on how the frame jitter occurs in online games. The error occurs iff someone is spectating or playing a live game (not catching up to online game, not watching from replay) and he is not the host, he is just a client. Here are our findings: Spring 95.0 has heavy random jitter, the frame times are printed in the infolog. Units seem to stop for a frame then jump forward a bit. Fast fourier tranformation of the delta times between two simframes show a minor peak at 6.66hz, but nothing very apparent. Spring 94.1 has a periodic jitter, where it seems that every 30th frame (but not nessessarily those that where simframe%30==0) is shorter than the others, and it looks like units 'jump' forward a small amount each frame. Especially visible on planes, nano particles, and other smoothly moving units and projectiles. Fast fourier transformation of the delta times between two sim frames (as measured by the attached widget) show a marked peak at 1hz. Spring 91 is just like spring 94.1, with periodic (about 1hz) jumps forward. We do not think that this is a new issue, only its nature has changed in 95.0, and I do not think it is the same as GoogleFrog's issue with 15 missed frames and a DT of 500, I'm sure those can happen from time to time, but this is happening every second, making a game that is running at 100+ fps seem very laggy and slow indeed. | ||||
| Steps To Reproduce | Join a game as a client (online), and watch as units stutter very frequently, and watch as cto assert errors pile up in debug view. Compare it to running spring locally, or just hosting a game. Maybe even try the dbg_frame_jitter.lua script to see the delta times line up with cto assert errors. | ||||
| Additional Information | My ping is pretty OK, 0000179:0000200 ms as measured by spring, on a fat pipe. Im not running out of GPU ram, it is not some GL call being very slow, I tested extensively with gDEBugger. I have plenty of ram, not much besides a browser running in the background. Freshly restarted system. The stutters are directly associated with CTO assertion fails. Each time there is a stutter, I get a CTO assertion fail (I have logflush off). /luaui disable, /luarules disable and disabling advmapshading and advmodelshading have no effect on the visual stuttering, as does disabling deferred rendering. Sim CPU use was below 25% during all tests. It happens from game start, not just heavy endgame. We tested on 3 computers, my i5-750 +gtx560ti + win7 x64, one i3-3217U + win8.1 x64 laptop, and Silentwings's pc, which I believe is similar to my PC (nvidia). It was reproducible on all 3 pcs. I disabled intel speedstep and turbo boost on mine to make sure it was not the cause. Using high precision timers has no effect on my pc. Frame time recordings are obviously flawed because of collection from widget:GameFrame(), but all our tests were carried out at at least 100 FPS To isolate network lag, I tested this between two PCs on a LAN, and the results are identical. | ||||
| Tags | No tags attached. | ||||
| Attached Files | |||||
| Checked infolog.txt for Errors | |||||
|
|
infolog.2.txt. is my PC (client) with the widget enabled and /debug on to show CTO assert failures infolog.3.txt is the pc (laptop) that hosted the game, on same LAN as my pc. /debug and widget on. infolog.4.txt is silentwing's pc (client) with the widget enabled and /debug on to show CTO assert failures The test game considered of /cheat, /nocost, each player builds 2 labs on dsd and puts them on repeat to build fighters and kbots or vehicles, and we sent the units towards each other. Orders were not spammed at the units. Also tested with just 100 fighters patrolling in a circle, same results. Replays available on request, didnt want to clutter this report further. All tests were done with BA 7.84, except on engine 91, where 7.76 was used. Map was DSD. |
|
|
"The error occurs iff someone is spectating or playing a live game" You really should have tested with 95+ which logs additional debugging info precisely to find out what the relation to live games is... |
|
|
Sure! I attached the results of testing with 95.0.1-99 I tested on my LAN with two pcs, everything described above for 95.0 is applicable to these results. The host PC is still butter smooth, the client is pretty jittery. Need the replay file too? Need help in setting up a hosted test game? Ill help with anything you ask me to. |
|
|
Ok, please do the following: 1) grab build 112-g5b4d186 2) on the machine you use as host, open springsettings.cfg and add "ServerLogDebugMessages = 1" (this will FLOOD its infolog, so make sure to set LogFlush = 0) 3) run another LAN game (doesn't need to be long, just enough to catch some instances of serious jitter) and then upload the host's infolog |
|
|
Done! Thanks :) |
|
|
Meh... server timings appear to check out, it consistently pushes simframe messages every 30 milliseconds (3 or 4 Update()'s which are spaced 10ms apart). So the problem is (unfortunately) in the network layer. |
|
|
Should I try hosting and playing on the same PC? |
|
|
Sure, just know that a client on the same machine as the server will still connect over UDP and latency/packet loss should be comparable to LAN conditions (as in none) so while you should do that test for the sake of completeness I don't expect the results to change. edit: when you get around to it, please use build 115-g65dc8b4 |
|
|
You are seriously making my day brighter :) . Ok, ill test with 115. Also, if this the error is in the network layer, then is it completely out of our control? |
|
|
Ok tested, with ServerLogDebugMessages and two pc's (didnt have time to try it on one, ping me again if you still feel it is needed). Infologs attached. |
|
|
Players indeed can not do anything, unless it turns out to be caused by some obscure network config setting whose value got reset in the change from 94 to 95 (unlikely). The fact that there is no jitter in local games also rules out any other config option. Logs now show definitively that the client packet queue dries up repeatedly. |
|
|
By the way, are you affected at all by this issue when playing/speccing online? |
|
|
Personally I have never seen this under normal online conditions no, but there are too many variables involved (differences in OS / hardware / routing / ...) to derive much from that. The biggest WTF is that the low-level netcode (packet reassembly crap and the like) was barely touched between 94 and 95, so atm I am out of ideas. |
|
|
Timeouted so retyping: This morning I tested with my main PC as host, and laptop as client, and I got the CTO assert errors as the host too! But oddly they were almost all NP=1! The client laptop had the usual NP=0 errors. 95.0 Infologs attached as infolog_wierd_run1_host.txt and infolog_wierd_run1_client.txt. After that I tested just on the main PC by launching spring.exe, and I was having jitter errors with CTO assert errors without even playing online. Unfortunately I did not save the infolog. I restarted my computer (i usually only reboot weekly) and I could not reproduce this, nor could I reproduce the host spamming CTO errors. Unfortunately after the reboot I couldnt reproduce with 95.0.1-115. Many others have CTO error spam, even people on linux (BrainDamage, Kixu) I will keep trying to reproduce this host side error spam with 95-115. |
|
|
This is probably a long shot but I wonder if the by default spinning threads can cause this by starvation, try if this make any difference: WorkerThreadSpinTime=0 and perhaps reduce threads if you have "HT cores": WorkerThreadCount=<less than the number of your "real cores"> |
|
|
My default workerthreadcont was -1, default workerthreadspintime was 5. A setting of (-1, 0), results in the same jitter. I have 4 real cores, no ht cores. A setting of (4,0) also results in jitter. Setting the above on the host pc (2 core, 4 thread laptop) to (2, 0) also results in jitter. |
|
|
For completeness, what about (0, 0)? Pure host-side jittering would imply netcode isn't a factor after all, so like cleanrock I'm considering if there is possible thread priority inversion going on. Out of curiousity: do MSVS builds (if you can make them) jitter for you? |
|
|
Same with (0,0) Ill try to build with msvc over the weekend. |
|
|
MSVC 11 build: same results. EDIT: Also, I was able to get the cto assert error while catching up to a replay (a very heavy 1500 unit FFA game). What can I do to help test more? Edit: added infolog_msvc* |
|
|
Should I test with singlestep maybe? EDIT: Tested with singlestepping (Alt+o) held down on host. The game runs relatively smoothly (though jitter is still visible), frame times are as expected (30fps sim), and no assert errors! EDIT2: note https://github.com/spring/spring/blob/develop/rts/Net/GameServer.cpp#L2412 It says that the server should make a few frames in advance, but based on the numbers in the infolog printed by https://github.com/spring/spring/blob/develop/rts/Net/GameServer.cpp#L2427 it doesnt actually create any! simFrameBehind is always at 0, rarely 1. |
|
|
That bit of code doesn't play a role except when the host client (if there is one) experiences high CPU-lag. Would you mind testing 145-g71e2e27? |
|
|
Tested with 145-same results as usual. Need infolog? Also, code says: // Don't create new frames when localClient (:= host) isn't able to process them fast enough. // Despite this still allow to create a few in advance to not lag other connected clients. Can it be that the N jK was talking about in this thread ( http://springrts.com/phpbb/viewtopic.php?f=12&t=30652 ) is just too low? And network jitter just makes it worse? EDIT: tested 146, same results too. Scream if you need infologs. |
|
|
From your LAN test results it's more likely just the case that the server can't / doesn't always feed clients at their expected rate because it sleeps too long (sleep has an inherent inaccuracy) between updates and its thread gets shuffled around a lot. By singlestep-spamming (shoveling frames out the door with every keypress so clients have a backlog) you mask this. For fun & games, set your keyboard repeat rate to a value smaller than 30Hz and you should experience the same jitter with singlestep. new build: 147-g2bf5bc0 |
|
|
use 147, not 146 |
|
|
Same with 147. Uploaded infologs (was testing on LAN, earlier today I was testing through internet (but in same country so ping= ~3ms). It happens also when I play on online hosts, my ping is usually a couple of hundred ms. When we tested with silentwings, I recall our pings were about 200ms. |
|
|
ok, I need another infolog of your client LAN machine with build 150-g31f8552. add UDPConnectionLogDebugMessages=1 in springsettings, then do the usual. |
|
|
Did it, infolog attached. Same jitter. EDIT: did one with 154 too, same jitter, attached infolog. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2013-11-27 01:27 | Beherith | New Issue | |
| 2013-11-27 01:27 | Beherith | File Added: dbg_frame_jitter.lua | |
| 2013-11-27 01:28 | Beherith | File Added: infolog.2.txt | |
| 2013-11-27 01:29 | Beherith | File Added: infolog.3.txt | |
| 2013-11-27 01:29 | Beherith | File Added: infolog.4.txt | |
| 2013-11-27 01:32 | Beherith | Note Added: 0012221 | |
| 2013-11-27 01:39 | Beherith | Note Edited: 0012221 | |
| 2013-11-27 01:40 | Beherith | Note Edited: 0012221 | |
| 2013-11-27 03:00 | Kloot | Note Added: 0012223 | |
| 2013-11-27 11:49 | Beherith | Note Added: 0012224 | |
| 2013-11-27 11:50 | Beherith | File Added: infolog_95.0.1-99_client.txt | |
| 2013-11-27 11:51 | Beherith | File Added: infolog_95.0.1-99_host.txt | |
| 2013-11-27 11:53 | Beherith | Note Edited: 0012224 | |
| 2013-11-27 18:33 | Kloot | Note Added: 0012225 | |
| 2013-11-27 18:46 | Kloot | Note Edited: 0012225 | |
| 2013-11-27 21:58 | Beherith | File Added: infolog_95_112_host.7z | |
| 2013-11-27 21:59 | Beherith | File Added: infolog_95_112_client.txt | |
| 2013-11-27 21:59 | Beherith | Note Added: 0012226 | |
| 2013-11-27 22:00 | Beherith | Note Edited: 0012226 | |
| 2013-11-27 23:06 | Kloot | Note Added: 0012229 | |
| 2013-11-27 23:07 | Kloot | Note Edited: 0012229 | |
| 2013-11-27 23:18 | Beherith | Note Added: 0012230 | |
| 2013-11-27 23:39 | Kloot | Note Added: 0012231 | |
| 2013-11-28 00:31 | Kloot | Note Edited: 0012231 | |
| 2013-11-28 00:59 | Beherith | Note Added: 0012232 | |
| 2013-11-28 00:59 | Beherith | Note Edited: 0012232 | |
| 2013-11-28 01:55 | Beherith | Note Added: 0012233 | |
| 2013-11-28 01:56 | Beherith | File Added: infolog_95_115_host.txt | |
| 2013-11-28 01:56 | Beherith | File Added: infolog_95_115_client.txt | |
| 2013-11-28 02:03 | Kloot | Note Added: 0012234 | |
| 2013-11-28 02:16 | Kloot | Note Edited: 0012234 | |
| 2013-11-28 13:29 | Beherith | Note Added: 0012244 | |
| 2013-11-28 14:16 | Kloot | Note Added: 0012245 | |
| 2013-11-28 14:16 | Kloot | Note Edited: 0012245 | |
| 2013-11-28 14:19 | Kloot | Note Edited: 0012245 | |
| 2013-11-28 19:31 | Beherith | File Added: infolog_wierd_run1_client.txt | |
| 2013-11-28 19:36 | Beherith | File Added: infolog_wierd_run1_host.txt | |
| 2013-11-28 19:40 | Beherith | Note Added: 0012257 | |
| 2013-11-28 19:41 | Beherith | Note Edited: 0012257 | |
| 2013-11-28 22:04 | abma | Target Version | => 96.0 |
| 2013-11-29 08:31 | cleanrock | Note Added: 0012260 | |
| 2013-11-29 09:22 | Beherith | Note Added: 0012261 | |
| 2013-11-29 16:18 | Kloot | Note Added: 0012263 | |
| 2013-11-29 16:28 | Kloot | Note Edited: 0012263 | |
| 2013-11-29 16:29 | Kloot | Note Edited: 0012263 | |
| 2013-11-29 16:54 | Beherith | Note Added: 0012264 | |
| 2013-11-30 19:44 | Beherith | Note Added: 0012277 | |
| 2013-11-30 19:44 | Beherith | Note Edited: 0012277 | |
| 2013-11-30 19:52 | Beherith | File Added: infolog_msvc11_client.txt | |
| 2013-11-30 19:53 | Beherith | Note Edited: 0012277 | |
| 2013-12-01 18:40 | Beherith | File Added: infolog_msvc11_host.txt | |
| 2013-12-02 13:22 | Beherith | Note Added: 0012293 | |
| 2013-12-02 15:58 | Beherith | Note Edited: 0012293 | |
| 2013-12-02 16:40 | Beherith | Note Edited: 0012293 | |
| 2013-12-02 16:48 | Beherith | Note Edited: 0012293 | |
| 2013-12-02 17:07 | Kloot | Note Added: 0012296 | |
| 2013-12-02 17:12 | Beherith | Note Edited: 0012293 | |
| 2013-12-02 17:13 | Kloot | Note Edited: 0012296 | |
| 2013-12-02 17:24 | Beherith | Note Added: 0012297 | |
| 2013-12-02 18:38 | Kloot | Note Added: 0012302 | |
| 2013-12-02 18:38 | Beherith | Note Edited: 0012297 | |
| 2013-12-02 19:01 | Kloot | Note Added: 0012303 | |
| 2013-12-02 19:31 | Beherith | File Added: infolog_client_147.txt | |
| 2013-12-02 19:32 | Beherith | File Added: infolog_host_147.7z | |
| 2013-12-02 19:33 | Beherith | Note Added: 0012304 | |
| 2013-12-02 22:18 | Kloot | Note Added: 0012309 | |
| 2013-12-03 12:03 | Beherith | File Added: infolog_151_client_dirty_pc.txt | |
| 2013-12-03 15:32 | Beherith | Note Added: 0012316 | |
| 2013-12-03 21:12 | Beherith | Note Edited: 0012316 | |
| 2013-12-03 21:13 | Beherith | File Added: infolog_154_client.txt | |
| 2013-12-04 21:02 | Kloot | Changeset attached | => spring develop 1b5a0d7a |
| 2013-12-04 21:02 | Kloot | Assigned To | => Kloot |
| 2013-12-04 21:02 | Kloot | Status | new => resolved |
| 2013-12-04 21:02 | Kloot | Resolution | open => fixed |