Debugging and stack traces

Debugging and stack traces

Discuss the source code and development of Spring Engine in general from a technical point of view. Patches go here too.

Moderator: Moderators

Post Reply
MajBoredom
Posts: 17
Joined: 24 Nov 2013, 09:14

Debugging and stack traces

Post by MajBoredom »

Before I post a bug on mantis, I was wondering if one of the developers can give instructions on how to read/interpret the stack traces in infolog.txt. I was told by Kloot that these are for the most part meaningless. Is there a way to compile the engine in order to get more information out of a replay re: stack traces and exceptions?

For example, in some games there is a ~3 second stall that accompanies a stack trace that looks like the following infolog.txt excerpt:

[f=0068671] Error: [safe_element const] index 1 out of bounds! (size 1)
[f=0068671] Error: Stacktrace for Spring 96.0:
[f=0068671] Error: <0> /home/joya/dev/spring/rts/Game/UnsyncedActionExecutor.h:42
[f=0068671] Error: <1> /home/joya/dev/spring/rts/Game/ChatMessage.cpp:16
[f=0068671] Error: <2> /usr/include/c++/4.8.2/bits/basic_ios.h:276
[f=0068671] Error: <3> /usr/include/c++/4.8.2/bits/stl_tree.h:1323 (discriminator 2)
[f=0068671] Error: <4> /home/joya/dev/spring/rts/Game/GUI/PlayerRoster.cpp:303
[f=0068671] Error: <5> /usr/include/c++/4.8.2/bits/basic_string.h:583
[f=0068671] Error: <6> /home/joya/dev/spring/rts/Game/Game.cpp:1628
[f=0068671] Error: <7> /home/joya/dev/spring/rts/Net/NetCommands.cpp:517
[f=0068671] Error: <8> /home/joya/dev/spring/rts/Game/Game.cpp:996
[f=0068671] Error: <9> /usr/include/c++/4.8.2/bits/basic_string.h:583
[f=0068671] Error: <10> /home/joya/dev/spring/rts/Game/UnsyncedGameCommands.cpp:1946 (discriminator 58)
[f=0068671] Error: <11> /home/joya/dev/spring/rts/Game/WaitCommandsAI.cpp:67
[f=0068671] Error: <12> /usr/include/boost/exception/exception.hpp:215
[f=0068671] Error: <13> /home/joya/dev/spring/rts/System/Main.cpp:126
[f=0068671] Error: <14> ??:?
[f=0068671] Error: <15> ??:?

This will occur about 1-4 times in a row, causing the engine to freeze momentarily. I think (just guessing) that the server is not encountering the problem since packets continue to arrive smoothly during the freeze; the engine goes to 10x game speed afterward in order to replay / catch up with the simulation state.

The stack traces happen deterministically in the same points when you watch the replay of a game that caused them, so I'm wondering if there is a way for me to compile with some debugging switches that will reveal better / more accurate information.

Thanks,
Mjr.Boredom
User avatar
smoth
Posts: 22309
Joined: 13 Jan 2005, 00:46

Re: Debugging and stack traces

Post by smoth »

it would help if you posted your full infolog.
User avatar
zwzsg
Kernel Panic Co-Developer
Posts: 7049
Joined: 16 Nov 2004, 13:08

Re: Debugging and stack traces

Post by zwzsg »

MajBoredom wrote:these are for the most part meaningless
It tells you what happened (trying to to access a cell outside an array), and where it happened (the file and line number where it occured, and who called it). What else would you need to know?
User avatar
Silentwings
Posts: 3720
Joined: 25 Oct 2008, 00:23

Re: Debugging and stack traces

Post by Silentwings »

I was told by Kloot that these are for the most part meaningless.
Afaik infolog stacktraces are only meaningless when the bug is connected to Spring's stacktrace system itself - as it was on the mantis report where I assume you heard that comment.
abma
Spring Developer
Posts: 3798
Joined: 01 Jun 2009, 00:08

Re: Debugging and stack traces

Post by abma »

looks similar to: http://springrts.com/mantis/view.php?id=4309

also a demo would be good as such bugs are mostly reproduceable with demo.
MajBoredom
Posts: 17
Joined: 24 Nov 2013, 09:14

Re: Debugging and stack traces

Post by MajBoredom »

zwzsg wrote:
MajBoredom wrote:these are for the most part meaningless
It tells you what happened (trying to to access a cell outside an array), and where it happened (the file and line number where it occured, and who called it). What else would you need to know?
Thanks. Let's assume I understand the basic premise of a stack trace as presented in an undergraduate CS class. ;)

The function stack shown in the above message is invalid. For example, there is no call to any UnsyncedActionExecutor function from a ChatMessage function on line 16 of ChatMessage.cpp, nor any other line. Similarly, none of the code in PlayerRoster.cpp calls any stl_tree methods. So it looks to me like I have the wrong symbols or that the stack trace functionality is once again misbehaving.

To confirm, I recompiled the engine and re-ran the replay. I get the same error messages with the same line numbers as before. Any ideas?? The engine's stack trace mechanism -- is there any way to interact with / configure it? Use it with gdb?

abma: It looks similar in the NetCommands.cpp:675 line. However, this line is inside the handler for NETMSG_AICOMMAND_TRACKED. Is this called during normal (multiplayer) settings? This is also what prompts me to think that the stack trace is misleading. However, I also get this:

[f=0032807] Error: [safe_element const] index 1 out of bounds! (size 1)

I think I should file as a separate bug until there is better information. Infolog + replay to follow on Mantis.

Tanks for your support!
MajBoredom
Posts: 17
Joined: 24 Nov 2013, 09:14

Re: Debugging and stack traces

Post by MajBoredom »

After some playing with libunwind and posix thread signals, I managed to create a stub program that creates a worker thread, runs it, then allows you to suspend that thread from the main thread (sort of Java-like). The suspended thread is just a signal handler for the worker that waits on a mutex previously locked by the main thread.

On top of that, the signal handler fills out a stack context object for the worker that can be interpreted by the main thread. Using libunwind, you can get a backtrace. The output looks as follows:

Code: Select all

[Main] Starting worker thread.
[Worker] installing signal handler...
[Worker] Doing important work [1/12]...
[Worker] Doing important work [2/12]...
[Worker] Doing important work [3/12]...
[Worker] Doing important work [4/12]...
[Worker] Doing important work [5/12]...
[Worker] Doing important work [6/12]...
[Main] The worker has been suspended.
[Handler] Caught SIGUSR1!
[Main] Simple backtrace using libunwind:
[Main]    ip = 0x3deac0ef90, sp = 0x7f656f5e6800,  __restore_rt()
[Main]    ip = 0x3deac0e7fd, sp = 0x7f656f5e6da0,  __nanosleep_nocancel()
[Main]    ip = 0x000040104e, sp = 0x7f656f5e6db0,  do_important_work()
[Main]    ip = 0x0000400f39, sp = 0x7f656f5e6de0,  worker_func()
[Main]    ip = 0x3deac07c53, sp = 0x7f656f5e6f20,  start_thread()
[Main]    ip = 0x3dea4f5dbd, sp = 0x7f656f5e6fc0,  clone()
[Main] Decoding stack ip values using backtrace_symbols():
[Main]    /lib64/libpthread.so.0() [0x3deac0ef90]
[Main]    /lib64/libpthread.so.0(__nanosleep+0x2d) [0x3deac0e7fd]
[Main]    ./signaltest() [0x40104e]
[Main]    ./signaltest() [0x400f39]
[Main]    /lib64/libpthread.so.0() [0x3deac07c53]
[Main]    /lib64/libc.so.6(clone+0x6d) [0x3dea4f5dbd]
[Main] Now resuming the worker.
[Worker] Doing important work [7/12]...
[Worker] Doing important work [8/12]...
[Worker] Doing important work [9/12]...
[Worker] Doing important work [10/12]...
[Worker] Doing important work [11/12]...
[Worker] Doing important work [12/12]...
[Main] Finished.
Notice that libunwind gives different text descriptions (the name of the function rather than just the module). Alternatively, you could use external calls to the addr2line program to make it look like our current setup.

So, if we alter the watchdog thread to use this technique, it means we can get precise stack traces from foreign threads by having the watchdog suspend them just long enough to do the stack trace.

If any BSD experts would like to jump in and tell me if there is an analogous way of doing this on a Mac, please do so. :)
Attachments
signaltest.c
C code, compiles with: gcc -g -o signaltest signaltest.c -lpthread -lunwind -lunwind-x86_64
(6.16 KiB) Downloaded 20 times
abma
Spring Developer
Posts: 3798
Joined: 01 Jun 2009, 00:08

Re: Debugging and stack traces

Post by abma »

looks very promising!
MajBoredom wrote:Notice that libunwind gives different text descriptions (the name of the function rather than just the module). Alternatively, you could use external calls to the addr2line program to make it look like our current setup.
code for that already exists, shouldn't be hard to integrate i guess:

https://github.com/spring/spring/blob/d ... andler.cpp

for osx... not sure if it will work out of the box. libunwind seems to be used there as well.

but i'm unsure if it will work with springs threadpool thing, jk knows this i think.

feeding the addresses to addr2line seems to work as well:

[Main] ip = 0x0000400f97, sp = 0x7f25efcd0d60, do_important_work()
[Main] ip = 0x0000400f35, sp = 0x7f25efcd0d90, worker_func()

addr2line -e signaltest
0x0000400f97
/tmp/9849178/signaltest.c:77
0x0000400f35
/tmp/9849178/signaltest.c:62 (discriminator 2)
Post Reply

Return to “Engine”