lobby server disconnects / high load at springrts.com server

lobby server disconnects / high load at springrts.com server

For the discussion of infrastructure improvements and changes.

Moderator: Moderators

Post Reply
abma
Spring Developer
Posts: 3544
Joined: 01 Jun 2009, 00:08

lobby server disconnects / high load at springrts.com server

Post by abma » 14 Oct 2014, 09:53

i guess most of you already noticed that users are disconnected by the lobby server several times a day.

this happens because the server is overloaded, when this happens the load average at the server is maybe >20 and because of that the lobby server doesn't respond fast enough to clients. sadly i have no clue how to fix this as it happens outside the vm i have access to.

As this problem started to begin when the windows vm was moved to this server it is very likely the windows vm causes a lot of IOPS and/or high cpu load which blocks the linux vm on the machine. Also on the linux vm is running A LOT of stuff which isn't related to spring. This made it pretty hard to setup a backup which excludes this stuff. I've requested several times that this stuff should be moved away from the vm where all spring stuff runs and also offered to move the spring stuff into a new vm when this vm is created. Sadly all my requests where delayed and nothing seems to happen at this part.

I'm not sure if its good to stay on this server as the load-peaks exist since a few months and nobody seems to care about it. Ideas / help welcome as i'm incapable of fixing this.

It should be noted that the server is currently (afaik) overfunded and a lot of donations are made to run this server.

TL;DR:

its a mess that donations have a balance of ~ +2000€ and this server is overloaded & there is running non-spring related stuff on the same machine (which very likely produces some income, too).
0 x

User avatar
Silentwings
Moderator
Posts: 3582
Joined: 25 Oct 2008, 00:23

Re: lobby server disconnects / high load at springrts.com se

Post by Silentwings » 14 Oct 2014, 11:53

I don't want to sound ungrateful for what we have - but two things that I think are quite bad:

Since about a year ago, there is no active page on springrts.com for making donations to Spring, or to be thanked for making donations. On the only active page, anywhere, from which it is possible to donate, there is no clear way offered to donate to Spring in its own right.

What abma said above - people (possibly) timing out because of non-Spring stuff running on a server that is funded through Spring. (edit: see abmas comment below)
Last edited by Silentwings on 14 Oct 2014, 14:28, edited 2 times in total.
0 x

User avatar
PicassoCT
Journeywar Developer & Mapper
Posts: 10213
Joined: 24 Jan 2006, 21:12

Re: lobby server disconnects / high load at springrts.com se

Post by PicassoCT » 14 Oct 2014, 13:10

Soo much Leechporn.. Licho, come on..
0 x

abma
Spring Developer
Posts: 3544
Joined: 01 Jun 2009, 00:08

Re: lobby server disconnects / high load at springrts.com se

Post by abma » 14 Oct 2014, 13:59

Silentwings wrote:people timing out because of non-Spring stuff running on a server that is funded through Spring
idk if its because of the non-spring stuff, but i'm sure it doesn't improve the situation.
0 x

dansan
Server Owner & Developer
Posts: 1190
Joined: 29 May 2010, 23:40

Re: lobby server disconnects / high load at springrts.com se

Post by dansan » 14 Oct 2014, 17:11

You can have (and manage yourself) a KVM-VM on the same server that runs the replays site. The hosts cores (i7-4770 @ 3.40GHz) are basically idle and has lots of RAM and HDD free... and even a free IP :)
0 x

abma
Spring Developer
Posts: 3544
Joined: 01 Jun 2009, 00:08

Re: lobby server disconnects / high load at springrts.com se

Post by abma » 20 Oct 2014, 14:19

dansan wrote:You can have (and manage yourself) a KVM-VM on the same server that runs the replays site. The hosts cores (i7-4770 @ 3.40GHz) are basically idle and has lots of RAM and HDD free... and even a free IP :)
thanks wrote you a mail with some questions.


regarding to the current problems (as i fear licho doesn't react to my pm's again i write it into public as well):

i'm not sure if this is the reason for the lags, but some program seems to spam a lot of these requests to the mysql server:

SELECT kniha.text,kniha.thread_id, kniha.jmeno, kniha_like.id_post, count(*) as pocet FROM kniha_like RIGHT JOIN kniha on kniha_like.id_post
= kniha.id WHERE kniha_like.id_post in (SELECT id_post FROM kniha_like WHERE `datum` > (SELECT DATE_SUB( curdate( ) , INTERVAL 500 DAY )) group by id_post) GROUP BY kniha_like.id_post order by pocet desc


"INTERVAL 500 DAY" is iterated from 1-1080 at least. this could explain the high iowait times / high cpu load for mysql (which blocks uberserver). this code / database is clearly not from springrts.com. This question goes directly to

@Licho:
when will you move the non-spring stuff from the springrts.com vm away?


i hate when i waste my spare time for others stuff.
0 x

abma
Spring Developer
Posts: 3544
Joined: 01 Jun 2009, 00:08

Re: lobby server disconnects / high load at springrts.com se

Post by abma » 20 Oct 2014, 17:46

some mysql statistics of productive databases:

size of spring db stuff:

Code: Select all

spring	1200	MB
lobby	130	MiB
etherpad-lite	36,6	MiB
size of non-spring stuff:

Code: Select all

kto	1.500	MB
majkluvsvet	157,7	MB
tgchan 865,6	MiB


thats spring 1400 MB vs non-spring 2500MB.

i still see zero arguments for not splitting non-spring stuff away from spring stuff. :-(

i don't like to give usage statistics as this would take a lot of more time (which i see as a waste of time), but when looking at usual cpu/io stats, non-spring stuff takes similar ammount of resources, too.

(dev databases are excluded as they are mostly not-used).
0 x

abma
Spring Developer
Posts: 3544
Joined: 01 Jun 2009, 00:08

Re: lobby server disconnects / high load at springrts.com se

Post by abma » 22 Oct 2014, 10:10

anyone here which has experience with esxi?

for me it seems both disks are (partly) broken?
# esxcli storage core device stats get
t10.ATA_____TOSHIBA_DT01ACA200_________________________________53SB5JXGS
Device: t10.ATA_____TOSHIBA_DT01ACA200_________________________________53SB5JXGS
Successful Commands: 77384
Blocks Read: 44
Blocks Written: 0
Read Operations: 44
Write Operations: 0
Reserve Operations: 0
Reservation Conflicts: 0
Failed Commands: 15445
Failed Blocks Read: 0
Failed Blocks Written: 0
Failed Read Operations: 0
Failed Write Operations: 0
Failed Reserve Operations: 0

t10.ATA_____TOSHIBA_DT01ACA200_________________________________53SDAH2AS
Device: t10.ATA_____TOSHIBA_DT01ACA200_________________________________53SDAH2AS
Successful Commands: 209459425
Blocks Read: 4633982441
Blocks Written: 4124955635
Read Operations: 107315482
Write Operations: 102110167
Reserve Operations: 6780
Reservation Conflicts: 0
Failed Commands: 19865
Failed Blocks Read: 0
Failed Blocks Written: 0
Failed Read Operations: 0
Failed Write Operations: 0
Failed Reserve Operations: 0
right?

sadly esxi 5.0 is running, so no access to smart data :-(
0 x


abma
Spring Developer
Posts: 3544
Joined: 01 Jun 2009, 00:08

Re: lobby server disconnects / high load at springrts.com se

Post by abma » 22 Oct 2014, 11:48

abma wrote:sadly esxi 5.0 is running, so no access to smart data :-(
sadly no! :-(
0 x

User avatar
Ligthert
Posts: 20
Joined: 15 Jul 2007, 22:52

Re: lobby server disconnects / high load at springrts.com se

Post by Ligthert » 22 Oct 2014, 12:24

Can't you do a SMART test from the BIOS?

Assuming the server is remote, can you access the OOB, enter the BIOS or RAID BIOS and check whats happening on a hardware level?

In either case, if you think your server is going tits-up one of these days: backup everything
0 x

abma
Spring Developer
Posts: 3544
Joined: 01 Jun 2009, 00:08

Re: lobby server disconnects / high load at springrts.com se

Post by abma » 22 Oct 2014, 14:02

Ligthert wrote:Can't you do a SMART test from the BIOS?

Assuming the server is remote, can you access the OOB, enter the BIOS or RAID BIOS and check whats happening on a hardware level?
this doesn't help as this would take the server down + if disk is broken i can't repair/replace it. licho is responsible / server owner. i'll try to migrate all springrts.com stuff to dansans machine, thats the current plan. no clue about zero-k.info.
0 x

MetalSucker
Posts: 98
Joined: 22 Sep 2014, 20:29

Re: lobby server disconnects / high load at springrts.com se

Post by MetalSucker » 22 Oct 2014, 15:25

The last time I had failed ATA commands it was when I had a faulty data cable connection to the HDD, so it's probably bad.
0 x

muckl
Posts: 151
Joined: 30 Aug 2010, 07:18

Re: lobby server disconnects / high load at springrts.com se

Post by muckl » 22 Oct 2014, 16:11

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, [no address given] and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.
Apache/2.2.22 (Ubuntu) Server at springrts.com Port 80

@ ~ 15:40 german time

+
[15:41:58] Connecting to lobby.springrts.com ...
[15:41:58] Connection established to lobby.springrts.com
[15:42:29] Timeout assumed. Disconnecting ...
[15:42:29] Connection to server closed!
[15:42:32] Connecting to lobby.springrts.com ...
[15:42:32] Connection established to lobby.springrts.com
[15:43:02] Timeout assumed. Disconnecting ...
[15:43:02] Connection to server closed!
[15:54:46] Connecting to lobby.springrts.com ...
[15:54:46] Connection established to lobby.springrts.com
[15:55:16] Timeout assumed. Disconnecting ...
[15:55:16] Connection to server closed!
[15:58:46] Connecting to lobby.springrts.com ...
[15:58:46] Connection established to lobby.springrts.com
[15:59:16] Connection to server closed!
[16:05:06] Connecting to lobby.springrts.com ...
[16:05:07] Connection established to lobby.springrts.com
[16:05:07] Login successful!

did u fixed something?

+1 to "shift stuff away from spring server/vm"

+ thanks to dansan for the hardware support - i would support too but i think this is enough
0 x


abma
Spring Developer
Posts: 3544
Joined: 01 Jun 2009, 00:08

Re: lobby server disconnects / high load at springrts.com se

Post by abma » 22 Oct 2014, 16:17

yeah, seems recent changes to server didn't improve the situation :-|

very-very likely disks or raid controller is broken/dying.

(-> will try to migrate at least springrts.com to dansans server)
0 x

User avatar
Ligthert
Posts: 20
Joined: 15 Jul 2007, 22:52

Re: lobby server disconnects / high load at springrts.com se

Post by Ligthert » 22 Oct 2014, 16:30

After reading this and looking around, I have to conclude the storage could very well be dying
0 x

abma
Spring Developer
Posts: 3544
Joined: 01 Jun 2009, 00:08

Re: lobby server disconnects / high load at springrts.com se

Post by abma » 22 Oct 2014, 22:56

ok, plan slightly adjusted:

licho contacted / will contact ovh about the possible broken storage and we hopefully get a new machine. meanwhile i try to create some script which copies current (linux) springrts.com (website + lobbyserver + buildbot, etc) to dansans machine so it can be used as fallback.
0 x

User avatar
Ligthert
Posts: 20
Joined: 15 Jul 2007, 22:52

Re: lobby server disconnects / high load at springrts.com se

Post by Ligthert » 22 Oct 2014, 23:53

As a system administrator I approve this message! ;-)
0 x

abma
Spring Developer
Posts: 3544
Joined: 01 Jun 2009, 00:08

Re: lobby server disconnects / high load at springrts.com se

Post by abma » 27 Oct 2014, 23:14

ovh offered sth. like:

- replace disk, which won't solve the random hangs imo as the private stuff is causing a big part of it
- replace server, when something "more expensive" is ordered when i understood it right, which is no opinion, too as imo way to many money is wasted atm already.

as dansan offered a fast server for free and the migration script now exists, i/we will try this first, he runs replays.springrts.com for a long time.

conclusion: migrate springrts.com to dansans server: http://springrts.com/phpbb/viewtopic.php?f=71&t=32692

idk what to do with the windows machine, but the current server without springrts.com should be fast enough. imo get rid of it and move services to linux, but thats sth. the zero-k devs have to decide/to do.
0 x

Post Reply

Return to “Infrastructure Development”