Engine Profiling
Development < Engine Profiling
Engine Profiling
Linux
Intro
There are several ways to profile an application.
gprof
The compiler injects special code in the binary that self profiles it when it runs. this is what cmake -DCMAKE_BUILD_TYPE=PROFILING
is here for. We are not going to use this. It does not interfere with normal program's work-flow. With spring, it gets so slow that it is unusable for profiling anything in late game (which is where it is needed the most). It requires special compiling options.
oprof
OProfile uses statistical sampling. It is a kernel module, which every n clock ticks, registers which function is currently being processed. It is lightweight; it may run with almost 0 performance impact. This method always profiles your profiles your whole system; relevant info has to be filtered out after the profiling run has finished. The downside of statistical profiling, is that it requires at least a certain sample size to achieve good accuracy. Also remember, that oprof still requires the debug symbols, to resolve addresses into function names. It can be turned on/off at any time, so for instance, you may skip spring loading so you would only profile in-game runtime. You can also merge multiple profiling sessions.
perf
Powerful and lightweight Linux performance profiler based on perf events. Part of linux kernel from version 2.6.31.
Setting up oprof
Required stuff:
- oprof kernel module
- oprof package
Using oprof
Note: Modern CPUs have special statistical sampling timing functions to make profiling with software like oprof fast. |
You will have to load the kernel module. The profiling daemon is controlled using opcontrol as root. As regular user, you can do post-processing using opreport.
Recommended settings
- use a call-graph depth of size 16
- disable kernel profiling & enable user-space profiling
- separate each application in its own profile
- separate each thread in its own profile; you can always merge them later, using opreport
A typical profiling session
# We need to be root
su -
# In case the kernel module is not already loaded ...
modprobe -v oprofile
# This has to be sent only once; it will be saved
opcontrol --event=CPU_CLK_UNHALTED:1000000:0x0:0:1 \
--callgraph=16 \
--separate=thread,library \
--no-vmlinux
# -- Bring spring into a state where you want to start profiling --
# start the profiler
opcontrol --start
# -- In spring, do whatever you want to profile --
# stop the profiler
opcontrol --stop
# in order to make the profile data available to user-space,
# you have to dump the data
opcontrol --dump
Post processing oprof data
We will first explain the post processing steps, and then give a script that should work for most general spring profiling tasks.
Extract the data
The data is now stored in oprofiles cache, and we need to extract the parts of it that we are interested in, and write them to a file. With the settings we used above, oprofile indexes stuff by thread- and process-ID. In order to generate a report, we will have to tell which program, which threads and which sampling sessions to include. If you do not care much, you can just merge thread data and sampling data, as we do here too.
Converting the data file to a dot-graph file
gprof2dot takes oprof data and outputs a nice dot-graph. We use it to generate the dot graph out of oprofiles --callgraph
version.
Finalizing (Creating an image out of the graph file)
Then you can either use XDot to view it directly, or create an image out of the graph file. We will use GraphViz's dot utility to render to an image.
Post Processing Script
#!/bin/bash
SPRING_INSTALL_PATH=$(dirname $(which spring))
BIN_ENGINE=${SPRING_INSTALL_PATH}/spring
BIN_AIS=$(ls ${SPRING_INSTALL_PATH}/AI/Skirmish/*/*/libSkirmishAI.so)
SETTINGS_CREATE_TABLE=0
SETTINGS_CREATE_CALLGRAPH=1
SETTINGS_CREATE_PNG=1
SETTINGS_CREATE_SVG=1
SETTINGS_CLEANUP=1
function profileBins() {
BASE_OUT_FILE_PATH=$1
MY_BINARIES="$2"
# extract relevant oprof data as table
if [ ${SETTINGS_CREATE_TABLE} == 1 ]; then
opreport \
--long-filenames \
--demangle=smart \
--merge=tid,tgid \
--symbols ${MY_BINARIES} \
--output-file ${BASE_OUT_FILE_PATH}.txt
fi
if [ ${SETTINGS_CREATE_CALLGRAPH} == 1 ]; then
# extract relevant oprof data as callgraph
opreport \
--long-filenames \
--callgraph \
--demangle=smart \
--merge=tid,tgid \
--symbols ${MY_BINARIES} \
--output-file ${BASE_OUT_FILE_PATH}.oprof
# create graph file
gprof2dot --format=oprofile --output=${BASE_OUT_FILE_PATH}.dot ${BASE_OUT_FILE_PATH}.oprof
# create images
if [ ${SETTINGS_CREATE_PNG} == 1 ]; then
dot -Tpng ${BASE_OUT_FILE_PATH}.dot > ${BASE_OUT_FILE_PATH}.png
fi
if [ ${SETTINGS_CREATE_SVG} == 1 ]; then
dot -Tsvg ${BASE_OUT_FILE_PATH}.dot > ${BASE_OUT_FILE_PATH}.svg
fi
# ... likewise for other formats
if [ ${SETTINGS_CLEANUP} == 1 ]; then
rm ${BASE_OUT_FILE_PATH}.oprof
rm ${BASE_OUT_FILE_PATH}.dot
fi
fi
}
#profileBins ./profiling_engine "${BIN_ENGINE}"
profileBins ./profiling_ais "${BIN_AIS}"
profileBins ./profiling_engine_and_ais "${BIN_ENGINE} ${BIN_AIS}"
This will leave you with these graph files:
./profiling_ais.png
./profiling_ais.svg
./profiling_engine_and_ais.png
./profiling_engine_and_ais.svg
Using perf
Setup
For ubuntu install linux-tools-common using apt-get. In fedora the package's name is perf.
Make sure /proc/sys/kernel/kptr_restrict
contains value 0
Useful commands
- perf top - shows statistics in real time
- perf record - save statistics to file for further insight
- perf report - read previously generated data
- perf diff a.data b.data - show differences between two report files
Example
- Get data of running spring proccess for 10 seconds. Pay attention to frequency parameter -F. At this rate the output takes about 17 mb of space. Parameter call-graph dwarf unwinds the stack even when frame pointer optimization is on.
perf record -F 99 -p PID_NUMBER -o my_output1.data --call-graph dwarf sleep 10
- Read the report. Press / and type symbol to narrow list to e.g "air"
perf report -i my_output1.data -g
Children Self Command Shared Symbol
+ 132.33% 29.15% unknown spring [.] CStrafeAirMoveType::Update
+ 71.36% 3.35% unknown spring [.] CStrafeAirMoveType::UpdateFlying
+ 52.01% 32.48% unknown spring [.] CStrafeAirMoveType::UpdateAirPhysics
+ 36.42% 0.00% unknown spring [.] CAirCAI::ExecuteFight
34.50% 12.55% unknown spring [.] CommandDrawer::DrawAirCAICommands
+ 32.11% 0.00% unknown spring [.] CStrafeAirMoveType::UpdateLanding
26.65% 12.90% unknown spring [.] CGameHelper::GetClosestEnemyAircraft
16.73% 0.00% unknown spring [.] CAirCAI::GiveCommandReal
15.91% 0.00% unknown spring [.] CStrafeAirMoveType::FindLandingPos
15.91% 6.42% unknown spring [.] CStrafeAirMoveType::BrakingDistance
9.82% 0.00% unknown spring [.] CAirCAI::AirAutoGenerateTarget
9.82% 0.00% unknown spring [.] CAirCAI::SlowUpdate
3.28% 0.00% unknown spring [.] CAirCAI::ExecuteMove
3.26% 0.00% unknown spring [.] CStrafeAirMoveType::HandleCollisions
3.17% 3.17% unknown spring [.] AAirMoveType::UseSmoothMesh
Flamegraph
perf script -i a.data | ./stackcollapse-perf.pl > out.perf-folded
- convert perf stacks data to text representation./flamegraph.pl out.perf-folded > my_idle_air.svg
- create flamegraph from perf data
or
grep -i air out.perf-folded | ./flamegraph.pl > my_idle_air.svg
- same as above, but it ignores stacks that do not contain "air" in symbol names
Links
perf examples by Brendan Gregg
Windows
<Please add>