Serious discussion regarding SSE/SSE2
Moderator: Moderators
Serious discussion regarding SSE/SSE2
As I understand it, SSE optimization is still not enabled in all builds. SSE needs to be either always on or always off for everyone to prevent sync errors.
If we do require it, we gain substantial performance for everyone at the cost of preventing a few old computers from playing at all.
Old computers without SSE are getting increasingly older and less common, so the incentives for switching are greater now than they were a year ago. At some point, it's simply not worth caring. The same goes for SSE2 and, eventually, SSE3.
Eventually, we'll need to make this decision. So let's talk about enabling it as a default build option in the next release and see what that might give us.
How much performance do we get from SSE, SSE2, and SSE3 relative to current? It would help to have some sort of numbers from someone willing to do a couple recompiles.
How many users do we lose by requiring SSE, SSE2, and SSE3? I myself don't even have an SSE3 machine yet, but these days even an EeePC is SSE3-enabled.
If we do require it, we gain substantial performance for everyone at the cost of preventing a few old computers from playing at all.
Old computers without SSE are getting increasingly older and less common, so the incentives for switching are greater now than they were a year ago. At some point, it's simply not worth caring. The same goes for SSE2 and, eventually, SSE3.
Eventually, we'll need to make this decision. So let's talk about enabling it as a default build option in the next release and see what that might give us.
How much performance do we get from SSE, SSE2, and SSE3 relative to current? It would help to have some sort of numbers from someone willing to do a couple recompiles.
How many users do we lose by requiring SSE, SSE2, and SSE3? I myself don't even have an SSE3 machine yet, but these days even an EeePC is SSE3-enabled.
Last edited by YokoZar on 16 Oct 2008, 00:13, edited 1 time in total.
Re: Serious discussion regarding SSE/SSE2
I planned to enable sse by default after 0.77 stabilizes. I think most of the devteam agrees, at least there were no voices opposing this move. This switch will require sync testing, so having the improved lobby supporting multiple versions would be helpful.
Re: Serious discussion regarding SSE/SSE2
is it feasible to allow performance critical code to use SSE if its detected that all clients can use it otherwise fallback otherwise like TA3D does?
Re: Serious discussion regarding SSE/SSE2
SSE is supported by all intel processors as of the P3 and all AMD processors as of the Athlon XP- so any processor released after 1999 for intel and 2001 for amd. Since processors of this magnitude of age and performance are far from suitable for spring in the first place, I'd definitely agree with adding SSE optimization.
SSE2 is included on all intel processors on from the P4 and all AMD processors starting with the Athlon 64 - 2003 or so. I'm fairly sure that a non-insignificant portion of Spring's users are on a socket-A AMD cpu (all of which are pre-SSE2), so perhaps SSE2 should wait for a year or two.
SSE2 is included on all intel processors on from the P4 and all AMD processors starting with the Athlon 64 - 2003 or so. I'm fairly sure that a non-insignificant portion of Spring's users are on a socket-A AMD cpu (all of which are pre-SSE2), so perhaps SSE2 should wait for a year or two.
-
- Spring Developer
- Posts: 1254
- Joined: 24 Jun 2007, 08:34
Re: Serious discussion regarding SSE/SSE2
While this makes sense for TA3D, it isn't possible to run spring in a playable way on a non-SSE cpu.AF wrote:is it feasible to allow performance critical code to use SSE if its detected that all clients can use it otherwise fallback otherwise like TA3D does?
Lets check:
- AMD supported it fully with the Athlon XP (would you try to run spring on a Duron?)
- Intel supported SSE with the Pentium III (spring with 500 MHz wtf?)
Remarks: on 64bit, gcc uses SSE2 by default and I didn't got any desyncs until now. As long as all build keep the same flag (STREFLOP_X87 or STREFLOP_SSE) it should be fine, even with different compiler optimizations applied (SSE vs. SSE3.whatever).
Re: Serious discussion regarding SSE/SSE2
I think you may be disappointed at how (not) much SSE the compiler actually uses when Spring is compiled with -mfpmath=387, as it also is on 64 bit. (unless this was changed in cmake?)Auswaschbar wrote: Remarks: on 64bit, gcc uses SSE2 by default and I didn't got any desyncs until now. As long as all build keep the same flag (STREFLOP_X87 or STREFLOP_SSE) it should be fine, even with different compiler optimizations applied (SSE vs. SSE3.whatever).
Also I think you may be disappointed at how much performance improvement SSE actually gives unless we actually make float3 a 4 component vector which is always aligned on 16 byte boundary, and hand optimize the vector instructions to be vectorized. (Maybe icc does a better job on automatic vectorization? Or has gcc improved with newer versions?)
Re: Serious discussion regarding SSE/SSE2
Nope, I really don't think this is worth the enormous amount of effort. (In particular when compared to simply changing a compiler option.)AF wrote:is it feasible to allow performance critical code to use SSE if its detected that all clients can use it otherwise fallback otherwise like TA3D does?
Re: Serious discussion regarding SSE/SSE2
... nice idea about the padding of a vector3 to an vector4, exchanging 25% percent of wasted memory against approx.100% speed-up (for a single vector3...for an array it might be solved without padding). a acceptable deal i think.Tobi wrote:I think you may be disappointed at how (not) much SSE the compiler actually uses when Spring is compiled with -mfpmath=387, as it also is on 64 bit. (unless this was changed in cmake?)Auswaschbar wrote: Remarks: on 64bit, gcc uses SSE2 by default and I didn't got any desyncs until now. As long as all build keep the same flag (STREFLOP_X87 or STREFLOP_SSE) it should be fine, even with different compiler optimizations applied (SSE vs. SSE3.whatever).
Also I think you may be disappointed at how much performance improvement SSE actually gives unless we actually make float3 a 4 component vector which is always aligned on 16 byte boundary, and hand optimize the vector instructions to be vectorized. (Maybe icc does a better job on automatic vectorization? Or has gcc improved with newer versions?)
beside, i know the idea of inline assembly was dropped for spring, but just to illustrate the potential of inline SSE instructions, here are some rough performance tests i made for ta3d for some (array'ed) vector3 operations
http://ta3d.darkstars.co.uk/forums/viewtopic.php?t=868
Re: Serious discussion regarding SSE/SSE2
While the padding is easy, I don't think the alignment is. In particular not for (temp) float3's created on stack. Maybe there's some __attribute__ for it tho, in GCC?
-
- Posts: 933
- Joined: 27 Feb 2006, 02:04
Re: Serious discussion regarding SSE/SSE2
SSE2 could be a big performance boost on P4 systems though, since their floating point performance was kind of anemic. I wonder if we could use TASClient or SpringLobby to pull some sort of "CPU Report" on what percent of people using Spring have SSE, SSE2, and SSE3 so far.Peet wrote:SSE2 is included on all intel processors on from the P4 and all AMD processors starting with the Athlon 64 - 2003 or so. I'm fairly sure that a non-insignificant portion of Spring's users are on a socket-A AMD cpu (all of which are pre-SSE2), so perhaps SSE2 should wait for a year or two.
Re: Serious discussion regarding SSE/SSE2
My current cpu (AMD x64 3800+) has SSE2 and it is more than two years old.
My new cpu has SSE4a support. And as Auswaschbar wrote cpus without SSE support are so old that they can't play Spring with more than maybe 10 fps -> unplayable!?!
But to really make it sure that no one gets hurt, el_matarife made a good suggest with collecting statistics from players through lobby clients but they should be asked before getting informations
.
What about 3DNow! support for AMD cpus?
My new cpu has SSE4a support. And as Auswaschbar wrote cpus without SSE support are so old that they can't play Spring with more than maybe 10 fps -> unplayable!?!
But to really make it sure that no one gets hurt, el_matarife made a good suggest with collecting statistics from players through lobby clients but they should be asked before getting informations

What about 3DNow! support for AMD cpus?
-
- Posts: 933
- Joined: 27 Feb 2006, 02:04
Re: Serious discussion regarding SSE/SSE2
Probably not possible to sync 3DNow! on AMD with Intel chips that don't support it, and regardless 3DNow! was more of an MMX / SSE competitor anyway.Agon wrote: What about 3DNow! support for AMD cpus?
By the way, I forgot to mention we need a good 100% repeatable benchmark script we can use for performance testing on different compiles since replays apparently won't work, and most of the testing scripts I know about don't run exactly the same way each time, though I guess we could run them like 10 times and come up with an average. A Lua script that could monitor the in depth performance statics from the B hotkey would be pretty nice, along with FPS and CPU use stats too.
Re: Serious discussion regarding SSE/SSE2
How about we make two AIs fight each other while a lua gadget records the commands they issue. We then write a third AI that loads the output of the lua gadget and feeds the commands out at the specified frames, thus removing the AI calculations from the equation.
The output might need to be regenerated should the content used change since it might lead to different game outcomes as a result making the test pointless, but otherwise it would be a good benchmark given the right choices.
Such a gadget could also be used to record human unit commands.
The output might need to be regenerated should the content used change since it might lead to different game outcomes as a result making the test pointless, but otherwise it would be a good benchmark given the right choices.
Such a gadget could also be used to record human unit commands.
Re: Serious discussion regarding SSE/SSE2
If it's just about checking sync then that's uselessly complex. Just compile 2 spring versions and set them to play against each other with whichever AI. Repeat 100 times and voila, you have some idea on whether it desyncs or not. (Scripts to help with this are in SVN, because I did this too to test a number of GCC versions against each other.)
Re: Serious discussion regarding SSE/SSE2
Indeed but ti would also provide a decent game against which to benchmark performance. It would be a 'reliable' springmark of sorts.
And a technical unit test since new options and renderings in the settings could be applied and ran through a short battery of tests with this. Glorified epic unit test of sorts.
And a technical unit test since new options and renderings in the settings could be applied and ran through a short battery of tests with this. Glorified epic unit test of sorts.
Re: Serious discussion regarding SSE/SSE2
... never heard of an possiblilty to force stack memory to be aligned (think would be too complicated) ... but you can always align it yourself by allocating +15 bytes and then going to next aligned address.Tobi wrote:While the padding is easy, I don't think the alignment is. In particular not for (temp) float3's created on stack. Maybe there's some __attribute__ for it tho, in GCC?
sure, some overhead....
-
- Posts: 933
- Joined: 27 Feb 2006, 02:04
Re: Serious discussion regarding SSE/SSE2
Right, the point is to have a reliable, repeatable benchmark we can not only apply to testing different compiles of Spring, but that we can use to test performance on SVN versions, or check performance of difference PCs against each other to see if ATI's performance sucks versus nVidia for instance, or even to have an easy script people can run to determine "Hey I really shouldn't be playing in 8 player games on metal maps with my current system cause I'll just lag everyone out".AF wrote:Indeed but it would also provide a decent game against which to benchmark performance. It would be a 'reliable' springmark of sorts.
And a technical unit test since new options and renderings in the settings could be applied and ran through a short battery of tests with this. Glorified epic unit test of sorts.
There's a ton of potential uses for such a script, but for right now it would just be nice to have one we could use to test out a SSE or SSE2 build for performance.
Re: Serious discussion regarding SSE/SSE2
Auswaschbar wrote: (...)
Lets check:(...)
- AMD supported it fully with the Athlon XP (would you try to run spring on a Duron?)



I play Spring on a Duron, AMD Duron 1800Mhz to be accurate and have no problems with playing Spring except really big/long/spammy games. Also almost every game I can see someone with higher CPU usage than me, so people play Spring on even older processors.
Re: Serious discussion regarding SSE/SSE2
How the F*** do you play with that machine!?!?
Re: Serious discussion regarding SSE/SSE2
Applebred core? According to the Duron wikipedia entry it does support SSE..contrary to my earlier post's source article.Rafal99 wrote:AMD Duron 1800Mhz