Serious discussion regarding SSE/SSE2

Serious discussion regarding SSE/SSE2

Discuss the source code and development of Spring Engine in general from a technical point of view. Patches go here too.

Moderator: Moderators

YokoZar
Posts: 883
Joined: 15 Jul 2007, 22:02

Serious discussion regarding SSE/SSE2

Post by YokoZar »

As I understand it, SSE optimization is still not enabled in all builds. SSE needs to be either always on or always off for everyone to prevent sync errors.

If we do require it, we gain substantial performance for everyone at the cost of preventing a few old computers from playing at all.

Old computers without SSE are getting increasingly older and less common, so the incentives for switching are greater now than they were a year ago. At some point, it's simply not worth caring. The same goes for SSE2 and, eventually, SSE3.


Eventually, we'll need to make this decision. So let's talk about enabling it as a default build option in the next release and see what that might give us.

How much performance do we get from SSE, SSE2, and SSE3 relative to current? It would help to have some sort of numbers from someone willing to do a couple recompiles.

How many users do we lose by requiring SSE, SSE2, and SSE3? I myself don't even have an SSE3 machine yet, but these days even an EeePC is SSE3-enabled.
Last edited by YokoZar on 16 Oct 2008, 00:13, edited 1 time in total.
imbaczek
Posts: 3629
Joined: 22 Aug 2006, 16:19

Re: Serious discussion regarding SSE/SSE2

Post by imbaczek »

I planned to enable sse by default after 0.77 stabilizes. I think most of the devteam agrees, at least there were no voices opposing this move. This switch will require sync testing, so having the improved lobby supporting multiple versions would be helpful.
User avatar
AF
AI Developer
Posts: 20687
Joined: 14 Sep 2004, 11:32

Re: Serious discussion regarding SSE/SSE2

Post by AF »

is it feasible to allow performance critical code to use SSE if its detected that all clients can use it otherwise fallback otherwise like TA3D does?
User avatar
Peet
Malcontent
Posts: 4384
Joined: 27 Feb 2006, 22:04

Re: Serious discussion regarding SSE/SSE2

Post by Peet »

SSE is supported by all intel processors as of the P3 and all AMD processors as of the Athlon XP- so any processor released after 1999 for intel and 2001 for amd. Since processors of this magnitude of age and performance are far from suitable for spring in the first place, I'd definitely agree with adding SSE optimization.

SSE2 is included on all intel processors on from the P4 and all AMD processors starting with the Athlon 64 - 2003 or so. I'm fairly sure that a non-insignificant portion of Spring's users are on a socket-A AMD cpu (all of which are pre-SSE2), so perhaps SSE2 should wait for a year or two.
Auswaschbar
Spring Developer
Posts: 1254
Joined: 24 Jun 2007, 08:34

Re: Serious discussion regarding SSE/SSE2

Post by Auswaschbar »

AF wrote:is it feasible to allow performance critical code to use SSE if its detected that all clients can use it otherwise fallback otherwise like TA3D does?
While this makes sense for TA3D, it isn't possible to run spring in a playable way on a non-SSE cpu.
Lets check:
  • AMD supported it fully with the Athlon XP (would you try to run spring on a Duron?)
  • Intel supported SSE with the Pentium III (spring with 500 MHz wtf?)
edit: peet was faster
Remarks: on 64bit, gcc uses SSE2 by default and I didn't got any desyncs until now. As long as all build keep the same flag (STREFLOP_X87 or STREFLOP_SSE) it should be fine, even with different compiler optimizations applied (SSE vs. SSE3.whatever).
Tobi
Spring Developer
Posts: 4598
Joined: 01 Jun 2005, 11:36

Re: Serious discussion regarding SSE/SSE2

Post by Tobi »

Auswaschbar wrote: Remarks: on 64bit, gcc uses SSE2 by default and I didn't got any desyncs until now. As long as all build keep the same flag (STREFLOP_X87 or STREFLOP_SSE) it should be fine, even with different compiler optimizations applied (SSE vs. SSE3.whatever).
I think you may be disappointed at how (not) much SSE the compiler actually uses when Spring is compiled with -mfpmath=387, as it also is on 64 bit. (unless this was changed in cmake?)

Also I think you may be disappointed at how much performance improvement SSE actually gives unless we actually make float3 a 4 component vector which is always aligned on 16 byte boundary, and hand optimize the vector instructions to be vectorized. (Maybe icc does a better job on automatic vectorization? Or has gcc improved with newer versions?)
Tobi
Spring Developer
Posts: 4598
Joined: 01 Jun 2005, 11:36

Re: Serious discussion regarding SSE/SSE2

Post by Tobi »

AF wrote:is it feasible to allow performance critical code to use SSE if its detected that all clients can use it otherwise fallback otherwise like TA3D does?
Nope, I really don't think this is worth the enormous amount of effort. (In particular when compared to simply changing a compiler option.)
shaddam
Posts: 14
Joined: 23 Aug 2005, 15:49

Re: Serious discussion regarding SSE/SSE2

Post by shaddam »

Tobi wrote:
Auswaschbar wrote: Remarks: on 64bit, gcc uses SSE2 by default and I didn't got any desyncs until now. As long as all build keep the same flag (STREFLOP_X87 or STREFLOP_SSE) it should be fine, even with different compiler optimizations applied (SSE vs. SSE3.whatever).
I think you may be disappointed at how (not) much SSE the compiler actually uses when Spring is compiled with -mfpmath=387, as it also is on 64 bit. (unless this was changed in cmake?)

Also I think you may be disappointed at how much performance improvement SSE actually gives unless we actually make float3 a 4 component vector which is always aligned on 16 byte boundary, and hand optimize the vector instructions to be vectorized. (Maybe icc does a better job on automatic vectorization? Or has gcc improved with newer versions?)
... nice idea about the padding of a vector3 to an vector4, exchanging 25% percent of wasted memory against approx.100% speed-up (for a single vector3...for an array it might be solved without padding). a acceptable deal i think.

beside, i know the idea of inline assembly was dropped for spring, but just to illustrate the potential of inline SSE instructions, here are some rough performance tests i made for ta3d for some (array'ed) vector3 operations
http://ta3d.darkstars.co.uk/forums/viewtopic.php?t=868
Tobi
Spring Developer
Posts: 4598
Joined: 01 Jun 2005, 11:36

Re: Serious discussion regarding SSE/SSE2

Post by Tobi »

While the padding is easy, I don't think the alignment is. In particular not for (temp) float3's created on stack. Maybe there's some __attribute__ for it tho, in GCC?
el_matarife
Posts: 933
Joined: 27 Feb 2006, 02:04

Re: Serious discussion regarding SSE/SSE2

Post by el_matarife »

Peet wrote:SSE2 is included on all intel processors on from the P4 and all AMD processors starting with the Athlon 64 - 2003 or so. I'm fairly sure that a non-insignificant portion of Spring's users are on a socket-A AMD cpu (all of which are pre-SSE2), so perhaps SSE2 should wait for a year or two.
SSE2 could be a big performance boost on P4 systems though, since their floating point performance was kind of anemic. I wonder if we could use TASClient or SpringLobby to pull some sort of "CPU Report" on what percent of people using Spring have SSE, SSE2, and SSE3 so far.
User avatar
Agon
Posts: 527
Joined: 16 May 2007, 18:33

Re: Serious discussion regarding SSE/SSE2

Post by Agon »

My current cpu (AMD x64 3800+) has SSE2 and it is more than two years old.
My new cpu has SSE4a support. And as Auswaschbar wrote cpus without SSE support are so old that they can't play Spring with more than maybe 10 fps -> unplayable!?!

But to really make it sure that no one gets hurt, el_matarife made a good suggest with collecting statistics from players through lobby clients but they should be asked before getting informations :wink: .

What about 3DNow! support for AMD cpus?
el_matarife
Posts: 933
Joined: 27 Feb 2006, 02:04

Re: Serious discussion regarding SSE/SSE2

Post by el_matarife »

Agon wrote: What about 3DNow! support for AMD cpus?
Probably not possible to sync 3DNow! on AMD with Intel chips that don't support it, and regardless 3DNow! was more of an MMX / SSE competitor anyway.

By the way, I forgot to mention we need a good 100% repeatable benchmark script we can use for performance testing on different compiles since replays apparently won't work, and most of the testing scripts I know about don't run exactly the same way each time, though I guess we could run them like 10 times and come up with an average. A Lua script that could monitor the in depth performance statics from the B hotkey would be pretty nice, along with FPS and CPU use stats too.
User avatar
AF
AI Developer
Posts: 20687
Joined: 14 Sep 2004, 11:32

Re: Serious discussion regarding SSE/SSE2

Post by AF »

How about we make two AIs fight each other while a lua gadget records the commands they issue. We then write a third AI that loads the output of the lua gadget and feeds the commands out at the specified frames, thus removing the AI calculations from the equation.

The output might need to be regenerated should the content used change since it might lead to different game outcomes as a result making the test pointless, but otherwise it would be a good benchmark given the right choices.

Such a gadget could also be used to record human unit commands.
Tobi
Spring Developer
Posts: 4598
Joined: 01 Jun 2005, 11:36

Re: Serious discussion regarding SSE/SSE2

Post by Tobi »

If it's just about checking sync then that's uselessly complex. Just compile 2 spring versions and set them to play against each other with whichever AI. Repeat 100 times and voila, you have some idea on whether it desyncs or not. (Scripts to help with this are in SVN, because I did this too to test a number of GCC versions against each other.)
User avatar
AF
AI Developer
Posts: 20687
Joined: 14 Sep 2004, 11:32

Re: Serious discussion regarding SSE/SSE2

Post by AF »

Indeed but ti would also provide a decent game against which to benchmark performance. It would be a 'reliable' springmark of sorts.

And a technical unit test since new options and renderings in the settings could be applied and ran through a short battery of tests with this. Glorified epic unit test of sorts.
shaddam
Posts: 14
Joined: 23 Aug 2005, 15:49

Re: Serious discussion regarding SSE/SSE2

Post by shaddam »

Tobi wrote:While the padding is easy, I don't think the alignment is. In particular not for (temp) float3's created on stack. Maybe there's some __attribute__ for it tho, in GCC?
... never heard of an possiblilty to force stack memory to be aligned (think would be too complicated) ... but you can always align it yourself by allocating +15 bytes and then going to next aligned address.
sure, some overhead....
el_matarife
Posts: 933
Joined: 27 Feb 2006, 02:04

Re: Serious discussion regarding SSE/SSE2

Post by el_matarife »

AF wrote:Indeed but it would also provide a decent game against which to benchmark performance. It would be a 'reliable' springmark of sorts.

And a technical unit test since new options and renderings in the settings could be applied and ran through a short battery of tests with this. Glorified epic unit test of sorts.
Right, the point is to have a reliable, repeatable benchmark we can not only apply to testing different compiles of Spring, but that we can use to test performance on SVN versions, or check performance of difference PCs against each other to see if ATI's performance sucks versus nVidia for instance, or even to have an easy script people can run to determine "Hey I really shouldn't be playing in 8 player games on metal maps with my current system cause I'll just lag everyone out".

There's a ton of potential uses for such a script, but for right now it would just be nice to have one we could use to test out a SSE or SSE2 build for performance.
User avatar
Rafal99
Posts: 162
Joined: 14 Jan 2006, 04:09

Re: Serious discussion regarding SSE/SSE2

Post by Rafal99 »

Auswaschbar wrote: (...)
Lets check:
  • AMD supported it fully with the Athlon XP (would you try to run spring on a Duron?)
(...)
:?: :?: :?:

I play Spring on a Duron, AMD Duron 1800Mhz to be accurate and have no problems with playing Spring except really big/long/spammy games. Also almost every game I can see someone with higher CPU usage than me, so people play Spring on even older processors.
User avatar
smoth
Posts: 22309
Joined: 13 Jan 2005, 00:46

Re: Serious discussion regarding SSE/SSE2

Post by smoth »

How the F*** do you play with that machine!?!?
User avatar
Peet
Malcontent
Posts: 4384
Joined: 27 Feb 2006, 22:04

Re: Serious discussion regarding SSE/SSE2

Post by Peet »

Rafal99 wrote:AMD Duron 1800Mhz
Applebred core? According to the Duron wikipedia entry it does support SSE..contrary to my earlier post's source article.
Post Reply

Return to “Engine”