matrix multiplication code optimization

matrix multiplication code optimization

Discuss the source code and development of Spring Engine in general from a technical point of view. Patches go here too.

Moderator: Moderators

User avatar
Bobcatben
Posts: 120
Joined: 10 Mar 2006, 17:01

matrix multiplication code optimization

Post by Bobcatben »

at first i was gonna ask if you guys were interested in a matrix multiplication function in inline assembly, but....

i just tried tweaking the formula in c++, and i got it to run almost as fast as the assembler version by exposing the m variable of the matrix, and doing m2.m[] instead of m2[], thus avoiding calling the [] operator every time.

the times for 1 million:
my assembly version: 337 ms
existing taspring c++ version: 1420 ms
taspring version after the change: 350 ms

just changing that made the matrix multiplication code in taspring 4 times faster.
User avatar
jcnossen
Former Engine Dev
Posts: 2440
Joined: 05 Jun 2005, 19:13

Post by jcnossen »

I doubt thats going to make more than a 0.001 difference in fps though ;)
Besides, shouldnt the optimizer optimize the [] operator away? At least thats what everyone thinks... :?
User avatar
Bobcatben
Posts: 120
Joined: 10 Mar 2006, 17:01

Post by Bobcatben »

the strangest thing is, the operator is inline, but when i look at the disassembly, its not, it calls it every time.
10053r
Posts: 297
Joined: 28 Feb 2005, 19:19

Post by 10053r »

What is matrix multiplication used for in the spring engine? Because if it is used as part of the sim loop, it could make a HUGE difference to large (more than 500 total units) games...
submarine
AI Developer
Posts: 834
Joined: 31 Jan 2005, 20:04

Post by submarine »

anyway, if the code is not totally unreadable/uncommented any speedup is welcome
User avatar
Bobcatben
Posts: 120
Joined: 10 Mar 2006, 17:01

Post by Bobcatben »

well, the assembler version is 290 or so lines, not including comments, i got almost the same performance when i exposed the matrix's M variable, and accessed it without the overloaded [] operator.
User avatar
Licho
Zero-K Developer
Posts: 3803
Joined: 19 May 2006, 19:13

Post by Licho »

Is it using SSE?
I remember I made implementation of matrix multiply using SSE (originally implemented for xbox), and It was many times faster than directx implementation.
Also using SSE it was just about 10-20 lines if I remember correctly.
User avatar
jcnossen
Former Engine Dev
Posts: 2440
Joined: 05 Jun 2005, 19:13

Post by jcnossen »

The way the COB/3do/s3o code calculates the matrices is suboptimal anyway, you could get a much bigger speedup by caching the matrices per cob piece every frame. IIRC, every time that a piece matrix is requested (which happens a lot), all the matrices of the parents are also calculated again. BOS scripts changing positions/rotations of pieces woul d have to be considered too though.

SSE can probably not be used due to floating point sync problems.
User avatar
Bobcatben
Posts: 120
Joined: 10 Mar 2006, 17:01

Post by Bobcatben »

when i looked at other math things done in sse it was to complicated for me to understand, so i wrote it in generic x86 assembly, but, just changing how the existing c++ code in taspring works will boost it to almost as fast as assembly.
trepan
Former Engine Dev
Posts: 1200
Joined: 17 Nov 2005, 00:52

Post by trepan »

Did you have debugging enabled during compilation? That can disable inlining.
I was also wondering if you are using windows, and if so, are you using the
mingw32 compiler? (that being the one used to produce the windows release
binaries)
User avatar
Bobcatben
Posts: 120
Joined: 10 Mar 2006, 17:01

Post by Bobcatben »

i did cause i couldnt get it to compile in release with the librarys i was using, now i got it to run in release, and turned on the optimizations in vc++.

heres my results this time.

in release with optimizations enabled
4'273'504 per second in assembler in release

4'694'835 per second in c++ in release without operator overload
1'358'695 per second in c++ in release with operator overload

3'952'569 per second in c++ taspring style without operator overload in release
2'314'814 per second in c++ taspring style with operator overload in release


the c++ formula i wrote out turns out to be the fastest of all of them when not using the overloaded operator.
Tobi
Spring Developer
Posts: 4598
Joined: 01 Jun 2005, 11:36

Post by Tobi »

TBH we really shouldn't use any assembler in Spring, the advantages (speed) just dont outweight the disadvantages (bad maintainability & readability).

As for the compiler not inlining the operator[], that sounds strange. Usually stuff like std::vector::size() and std::vector::operator[] does get inlined (at least I usually can't call them in the debugger because they don't exist if I compiled with optimization and debugging enabled).

SSE will indeed most probably desync, plus we'd need a 387 version anyway for PCs without SSE. (Plus just enabling SSE math in the compiler would be a lot easier.. don't get good vectorization then tho...)
10053r
Posts: 297
Joined: 28 Feb 2005, 19:19

Post by 10053r »

SSE has existed since the Pentium 4. Now that Macs use Intel chips, I don't think it is unreasonable to require it. You would be locking out the PowerPC users, but Apple doesn't sell PPC Macs anymore, and hasn't for almost a year. Considering I expect it will be a while before the MacOS version syncs anyway, I think SSE is totally reasonable to demand.

Not that I think we should be using assembly in the code anyway, for the reasons Tobi mentioned.

BTW, anyone who manages to speed up the sim loop or the unithandler loop will get lots of gratitude from me. I miss huge games. OTA used to support thousands of units in one game, and spring can't handle 500 without slowing down.
Tobi
Spring Developer
Posts: 4598
Joined: 01 Jun 2005, 11:36

Post by Tobi »

VIA C3 etc. don't have SSE.
User avatar
Bobcatben
Posts: 120
Joined: 10 Mar 2006, 17:01

Post by Bobcatben »

i been wanting to try tweaking with other parts, but frankly i cant get taspring to compile, i spent a couple hours today trying to get all the libraries it wants, and when i added the wtl library for the crashhandler, it started spitting out hundreds of errors, saying stuff like float3 and frustum dont exsist.
User avatar
jcnossen
Former Engine Dev
Posts: 2440
Joined: 05 Jun 2005, 19:13

Post by jcnossen »

Getting the libraries should be as simple as downloading the library package... did you follow the compiling thread?
User avatar
Dragon45
Posts: 2883
Joined: 16 Aug 2004, 04:36

Post by Dragon45 »

Could we compile our own TASpring variations using these modifications and still be able to play against non-optimized clients?
Tobi
Spring Developer
Posts: 4598
Joined: 01 Jun 2005, 11:36

Post by Tobi »

Only if you can guarantee your code gives equal results in all possible combinations. Otherwise you'd just desync...
User avatar
MadRat
Posts: 532
Joined: 24 Oct 2006, 13:45

Post by MadRat »

Going purely from web statistics, could someone deduct an approximate of the community that is on machines that are not SSE capable? It might be a nonissue for Cyrix/G3 compatibility.
User avatar
Bobcatben
Posts: 120
Joined: 10 Mar 2006, 17:01

Post by Bobcatben »

well, as i said in my first post, i gave up on the assembly thing, even in my own program, cause unless i use sse or 3d now, optimized c++ is faster if you dont use the overloaded operator, which is what i been trying to say, the method used in taspring right now, is as fast as my assembly version, if you access the M variable directly instead of using the overloaded operator.

and no i didnt know about any compiling guide, i just loaded the vc7 project. and started trying to fix the errors.

edit:i just put in the library pack thing, i was surprised that it had the newer stuff you guys added like the crash handler, but its boost library was to old to compile 74b3 on so i had to update it.

edit again: ah it doesnt have the crash handler, and thats the killer, when i give it those library's the whole thing explodes, or rather, crashrpt wants the wtl libraries, which once i include them, every class in the whole game loses its definition.
Post Reply

Return to “Engine”