View Issue Details

IDProjectCategoryView StatusLast Update
0000757Spring engineGeneralpublic2008-01-14 20:25
Reporterimbaczek Assigned ToAuswaschbar  
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
Summary0000757: spectator timeout causes host to crash
Descriptionhttp://spring.clan-sy.com/phpbb/viewtopic.php?p=247012#p247012

the issue also came out during today's testing with Awaker "Finalizing" taking much too long.
TagsNo tags attached.
Attached Files
replays.zip (Attachment missing)
infolog.txt (Attachment missing)
Checked infolog.txt for Errors

Relationships

has duplicate 0000756 resolvedtvo If player/spec disconnects during loading, host crashes (R5195) 

Activities

Awaker

2008-01-01 23:00

reporter   ~0001629

The infolog is somewhat messy because I wanted to leave but the game was started before I could. I think the error occured another time but I'm not sure. The 3 tries before I couldn't see anything which was chatted. Sadly I don't know whether it appeared in infolog.txt, cause it was overritten accidently.

Awaker

2008-01-01 23:04

reporter   ~0001630

The first three times finalizing took very long (about 40 s).

Kloot

2008-01-02 04:46

developer   ~0001638

Last edited: 2008-01-02 08:48

FWIW, tried reproducing this with only local
connections (by running two Spring instances
and having the client connect to 127.0.0.1)
and could not get the server to crash, neither
by killing the client process during loading
nor by adding this...

    if (!gameServer) {
        // simulate client timeout
        while (1);
    }

to CGame::CGame() and removing the net-update
thread, so drops are handled OK for those.

KDR_11k

2008-01-02 08:55

reporter   ~0001640

If we can get another crash to happen with Awaker I'd suggest a packet sniffer, I can't shake the feeling that it's related to the network traffic, like corrupted packets or something.

imbaczek

2008-01-02 09:38

reporter   ~0001641

everything's possible here. wireshark could use spring protocol definitions.

tvo

2008-01-02 12:24

reporter   ~0001642

Same for me, unable to reproduce when doing laptop vs desktop MSVC build.

Did fix some other 3 bugs along the way though, one of which has a minor chance of being related: CNetProtocol::loading was a non-volatile variable, but accessed by the CNetProtocol::UpdateLoop() thread, AND by CGame::CGame(), so in theory the compiler may have optimized UpdateLoop() to a while(true), but I've never before seen it do such aggressive optimization, so I highly doubt it matters... (and if it did thread.join() should have blocked indefinitely)

tvo

2008-01-02 13:05

reporter   ~0001643

Last edited: 2008-01-02 13:06

Found something:

<-- Here I made client time out using a breakpoint and waiting 30 sec :-)

(gdb) cont
Continuing.
Client connected on slot 1
Lost connection to [RoX]Tobi2
Lost connection to player 1 (timeout)

Program received signal SIGINT, Interrupt.
0xffffe410 in __kernel_vsyscall ()
(gdb) bt
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb77d6e96 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7ee3018 in SDL_Delay () from /usr/lib/libSDL-1.2.so.0
0000003 0x081a6227 in CGame::Draw (this=0x9585dc8) at rts/Game/Game.cpp:2067
0000004 0x080f70f2 in SpringApp::Update (this=0xbf84ef14) at rts/System/Main.cpp:872
0000005 0x080fc4a5 in SpringApp::Run (this=0xbf84ef14, argc=2, argv=0xbf84f004) at rts/System/Main.cpp:1055
#6 0x080fc81a in Run (argc=2, argv=0xbf84f004) at rts/System/Main.cpp:1115
#7 0x080fc972 in main (argc=0, argv=0x834970, envp=0x0) at rts/System/Main.cpp:1168
(gdb) call m_validateAllAllocUnits()
$4 = false

<-- here I continued client for one net->Update() call

(gdb) cont
Continuing.
terminate called after throwing an instance of 'boost::bad_ptr_container_operation'
  what(): 'pop_front()' on empty container

Program received signal SIGABRT, Aborted.
[Switching to Thread -1261085808 (LWP 7925)]
0xffffe410 in __kernel_vsyscall ()
(gdb) bt
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb776e875 in raise () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7770201 in abort () from /lib/tls/i686/cmov/libc.so.6
0000003 0xb79796e0 in __gnu_cxx::__verbose_terminate_handler () from /usr/lib/libstdc++.so.6
0000004 0xb7976f65 in ?? () from /usr/lib/libstdc++.so.6
0000005 0xb7976fa2 in std::terminate () from /usr/lib/libstdc++.so.6
#6 0xb79770ca in __cxa_throw () from /usr/lib/libstdc++.so.6
#7 0x0817b04f in netcode::UDPConnection::AckPackets (this=0x114ece18, nextAck=12) at /usr/include/boost/ptr_container/ptr_sequence_adapter.hpp:215
#8 0x0817b0b8 in netcode::UDPConnection::ProcessRawPacket (this=0x114ece18, packet=0x114da450) at rts/System/Net/UDPConnection.cpp:133
#9 0x081854ff in netcode::UDPListener::Update (this=0x8f7bb78, waitingQueue=@0x9561f08) at rts/System/Net/UDPListener.cpp:64
0000010 0x08186511 in netcode::CNet::Update (this=0x9561ef8) at rts/System/Net/Net.cpp:217
#11 0x081c4f98 in CGameServer::Update (this=0x9584620) at rts/Game/GameServer.cpp:333
0000012 0x081c52a7 in CGameServer::UpdateLoop (this=0x9584620) at rts/Game/GameServer.cpp:852
0000013 0xb7d9d41a in boost::function0<void, std::allocator<boost::function_base> >::operator() () from /usr/lib/libboost_thread-gcc41-mt-1_34_1.so.1.34.1
0000014 0xb7d9d047 in ?? () from /usr/lib/libboost_thread-gcc41-mt-1_34_1.so.1.34.1
#15 0xb750846b in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#16 0xb78176de in clone () from /lib/tls/i686/cmov/libc.so.6
(gdb)

Auswaschbar

2008-01-02 13:41

reporter   ~0001644

I don't think it's a server crash because there was a NETMSG_QUIT recieved in the attached infolog ("Server exited" is triggered by this message).

Auswaschbar

2008-01-02 13:48

reporter   ~0001645

Last edited: 2008-01-02 14:05

Good news is I was able to reproduce this error using the network tester, and fixed the pop_front_on_empty_container. Now the connection becomes invalid and it will show a timeout.

Bad news is I have no Idea why the client want to acknowledge packets which are never sent.
edit: The reason is: the connection is closed on server side, while the client still uses the old connection. Because the this client side ack's packets sent from the old server connection.

Another bug still remains: why are there no keepalivepackets sent from the client side?

tvo

2008-01-02 16:00

reporter   ~0001646

maybe it was just swapping so badly or something that the update thread didn't get a chance to run for 30 seconds...

seems weird though, but not blocking anymore

Auswaschbar

2008-01-04 16:17

reporter   ~0001703

I'm not able to reproduce this any more. Anyone else?

tvo

2008-01-14 19:24

reporter   ~0001821

I didn't see it in game (0.76b1) at all yet, so I'd assume it's solved.

Auswaschbar

2008-01-14 20:25

reporter   ~0001826

Noone is able to reproduce this, so I'm closing it.

Issue History

Date Modified Username Field Change
2008-01-01 22:28 imbaczek New Issue
2008-01-01 22:28 imbaczek Severity minor => block
2008-01-01 22:28 imbaczek Relationship added has duplicate 0000756
2008-01-01 22:47 Awaker File Added: replays.zip
2008-01-01 22:53 Awaker File Added: infolog.txt
2008-01-01 23:00 Awaker Note Added: 0001629
2008-01-01 23:04 Awaker Note Added: 0001630
2008-01-02 04:46 Kloot Note Added: 0001638
2008-01-02 08:48 Kloot Note Edited: 0001638
2008-01-02 08:55 KDR_11k Note Added: 0001640
2008-01-02 09:38 imbaczek Note Added: 0001641
2008-01-02 12:24 tvo Note Added: 0001642
2008-01-02 13:05 tvo Note Added: 0001643
2008-01-02 13:06 tvo Note Edited: 0001643
2008-01-02 13:41 Auswaschbar Note Added: 0001644
2008-01-02 13:48 Auswaschbar Note Added: 0001645
2008-01-02 14:05 Auswaschbar Note Edited: 0001645
2008-01-02 16:00 tvo Note Added: 0001646
2008-01-02 16:00 tvo Severity block => major
2008-01-04 16:17 Auswaschbar Note Added: 0001703
2008-01-14 19:24 tvo Note Added: 0001821
2008-01-14 20:25 Auswaschbar Status new => resolved
2008-01-14 20:25 Auswaschbar Resolution open => fixed
2008-01-14 20:25 Auswaschbar Assigned To => Auswaschbar
2008-01-14 20:25 Auswaschbar Note Added: 0001826