lobby server disconnects / high load at springrts.com server
Moderator: Moderators
lobby server disconnects / high load at springrts.com server
i guess most of you already noticed that users are disconnected by the lobby server several times a day.
this happens because the server is overloaded, when this happens the load average at the server is maybe >20 and because of that the lobby server doesn't respond fast enough to clients. sadly i have no clue how to fix this as it happens outside the vm i have access to.
As this problem started to begin when the windows vm was moved to this server it is very likely the windows vm causes a lot of IOPS and/or high cpu load which blocks the linux vm on the machine. Also on the linux vm is running A LOT of stuff which isn't related to spring. This made it pretty hard to setup a backup which excludes this stuff. I've requested several times that this stuff should be moved away from the vm where all spring stuff runs and also offered to move the spring stuff into a new vm when this vm is created. Sadly all my requests where delayed and nothing seems to happen at this part.
I'm not sure if its good to stay on this server as the load-peaks exist since a few months and nobody seems to care about it. Ideas / help welcome as i'm incapable of fixing this.
It should be noted that the server is currently (afaik) overfunded and a lot of donations are made to run this server.
TL;DR:
its a mess that donations have a balance of ~ +2000€ and this server is overloaded & there is running non-spring related stuff on the same machine (which very likely produces some income, too).
this happens because the server is overloaded, when this happens the load average at the server is maybe >20 and because of that the lobby server doesn't respond fast enough to clients. sadly i have no clue how to fix this as it happens outside the vm i have access to.
As this problem started to begin when the windows vm was moved to this server it is very likely the windows vm causes a lot of IOPS and/or high cpu load which blocks the linux vm on the machine. Also on the linux vm is running A LOT of stuff which isn't related to spring. This made it pretty hard to setup a backup which excludes this stuff. I've requested several times that this stuff should be moved away from the vm where all spring stuff runs and also offered to move the spring stuff into a new vm when this vm is created. Sadly all my requests where delayed and nothing seems to happen at this part.
I'm not sure if its good to stay on this server as the load-peaks exist since a few months and nobody seems to care about it. Ideas / help welcome as i'm incapable of fixing this.
It should be noted that the server is currently (afaik) overfunded and a lot of donations are made to run this server.
TL;DR:
its a mess that donations have a balance of ~ +2000€ and this server is overloaded & there is running non-spring related stuff on the same machine (which very likely produces some income, too).
- Silentwings
- Posts: 3720
- Joined: 25 Oct 2008, 00:23
Re: lobby server disconnects / high load at springrts.com se
I don't want to sound ungrateful for what we have - but two things that I think are quite bad:
Since about a year ago, there is no active page on springrts.com for making donations to Spring, or to be thanked for making donations. On the only active page, anywhere, from which it is possible to donate, there is no clear way offered to donate to Spring in its own right.
What abma said above - people (possibly) timing out because of non-Spring stuff running on a server that is funded through Spring. (edit: see abmas comment below)
Since about a year ago, there is no active page on springrts.com for making donations to Spring, or to be thanked for making donations. On the only active page, anywhere, from which it is possible to donate, there is no clear way offered to donate to Spring in its own right.
What abma said above - people (possibly) timing out because of non-Spring stuff running on a server that is funded through Spring. (edit: see abmas comment below)
Last edited by Silentwings on 14 Oct 2014, 14:28, edited 2 times in total.
Re: lobby server disconnects / high load at springrts.com se
Soo much Leechporn.. Licho, come on..
Re: lobby server disconnects / high load at springrts.com se
idk if its because of the non-spring stuff, but i'm sure it doesn't improve the situation.Silentwings wrote:people timing out because of non-Spring stuff running on a server that is funded through Spring
Re: lobby server disconnects / high load at springrts.com se
You can have (and manage yourself) a KVM-VM on the same server that runs the replays site. The hosts cores (i7-4770 @ 3.40GHz) are basically idle and has lots of RAM and HDD free... and even a free IP :)
Re: lobby server disconnects / high load at springrts.com se
thanks wrote you a mail with some questions.dansan wrote:You can have (and manage yourself) a KVM-VM on the same server that runs the replays site. The hosts cores (i7-4770 @ 3.40GHz) are basically idle and has lots of RAM and HDD free... and even a free IP :)
regarding to the current problems (as i fear licho doesn't react to my pm's again i write it into public as well):
i'm not sure if this is the reason for the lags, but some program seems to spam a lot of these requests to the mysql server:
SELECT kniha.text,kniha.thread_id, kniha.jmeno, kniha_like.id_post, count(*) as pocet FROM kniha_like RIGHT JOIN kniha on kniha_like.id_post
= kniha.id WHERE kniha_like.id_post in (SELECT id_post FROM kniha_like WHERE `datum` > (SELECT DATE_SUB( curdate( ) , INTERVAL 500 DAY )) group by id_post) GROUP BY kniha_like.id_post order by pocet desc
"INTERVAL 500 DAY" is iterated from 1-1080 at least. this could explain the high iowait times / high cpu load for mysql (which blocks uberserver). this code / database is clearly not from springrts.com. This question goes directly to
@Licho:
when will you move the non-spring stuff from the springrts.com vm away?
i hate when i waste my spare time for others stuff.
Re: lobby server disconnects / high load at springrts.com se
some mysql statistics of productive databases:
size of spring db stuff:
size of non-spring stuff:
thats spring 1400 MB vs non-spring 2500MB.
i still see zero arguments for not splitting non-spring stuff away from spring stuff.
i don't like to give usage statistics as this would take a lot of more time (which i see as a waste of time), but when looking at usual cpu/io stats, non-spring stuff takes similar ammount of resources, too.
(dev databases are excluded as they are mostly not-used).
size of spring db stuff:
Code: Select all
spring 1200 MB
lobby 130 MiB
etherpad-lite 36,6 MiB
Code: Select all
kto 1.500 MB
majkluvsvet 157,7 MB
tgchan 865,6 MiB
thats spring 1400 MB vs non-spring 2500MB.
i still see zero arguments for not splitting non-spring stuff away from spring stuff.
i don't like to give usage statistics as this would take a lot of more time (which i see as a waste of time), but when looking at usual cpu/io stats, non-spring stuff takes similar ammount of resources, too.
(dev databases are excluded as they are mostly not-used).
Re: lobby server disconnects / high load at springrts.com se
anyone here which has experience with esxi?
for me it seems both disks are (partly) broken?
sadly esxi 5.0 is running, so no access to smart data
for me it seems both disks are (partly) broken?
right?# esxcli storage core device stats get
t10.ATA_____TOSHIBA_DT01ACA200_________________________________53SB5JXGS
Device: t10.ATA_____TOSHIBA_DT01ACA200_________________________________53SB5JXGS
Successful Commands: 77384
Blocks Read: 44
Blocks Written: 0
Read Operations: 44
Write Operations: 0
Reserve Operations: 0
Reservation Conflicts: 0
Failed Commands: 15445
Failed Blocks Read: 0
Failed Blocks Written: 0
Failed Read Operations: 0
Failed Write Operations: 0
Failed Reserve Operations: 0
t10.ATA_____TOSHIBA_DT01ACA200_________________________________53SDAH2AS
Device: t10.ATA_____TOSHIBA_DT01ACA200_________________________________53SDAH2AS
Successful Commands: 209459425
Blocks Read: 4633982441
Blocks Written: 4124955635
Read Operations: 107315482
Write Operations: 102110167
Reserve Operations: 6780
Reservation Conflicts: 0
Failed Commands: 19865
Failed Blocks Read: 0
Failed Blocks Written: 0
Failed Read Operations: 0
Failed Write Operations: 0
Failed Reserve Operations: 0
sadly esxi 5.0 is running, so no access to smart data
Re: lobby server disconnects / high load at springrts.com se
sadly no!abma wrote:sadly esxi 5.0 is running, so no access to smart data
Re: lobby server disconnects / high load at springrts.com se
Can't you do a SMART test from the BIOS?
Assuming the server is remote, can you access the OOB, enter the BIOS or RAID BIOS and check whats happening on a hardware level?
In either case, if you think your server is going tits-up one of these days: backup everything
Assuming the server is remote, can you access the OOB, enter the BIOS or RAID BIOS and check whats happening on a hardware level?
In either case, if you think your server is going tits-up one of these days: backup everything
Re: lobby server disconnects / high load at springrts.com se
this doesn't help as this would take the server down + if disk is broken i can't repair/replace it. licho is responsible / server owner. i'll try to migrate all springrts.com stuff to dansans machine, thats the current plan. no clue about zero-k.info.Ligthert wrote:Can't you do a SMART test from the BIOS?
Assuming the server is remote, can you access the OOB, enter the BIOS or RAID BIOS and check whats happening on a hardware level?
-
- Posts: 98
- Joined: 22 Sep 2014, 20:29
Re: lobby server disconnects / high load at springrts.com se
The last time I had failed ATA commands it was when I had a faulty data cable connection to the HDD, so it's probably bad.
Re: lobby server disconnects / high load at springrts.com se
Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator, [no address given] and inform them of the time the error occurred, and anything you might have done that may have caused the error.
More information about this error may be available in the server error log.
Apache/2.2.22 (Ubuntu) Server at springrts.com Port 80
@ ~ 15:40 german time
+
[15:41:58] Connecting to lobby.springrts.com ...
[15:41:58] Connection established to lobby.springrts.com
[15:42:29] Timeout assumed. Disconnecting ...
[15:42:29] Connection to server closed!
[15:42:32] Connecting to lobby.springrts.com ...
[15:42:32] Connection established to lobby.springrts.com
[15:43:02] Timeout assumed. Disconnecting ...
[15:43:02] Connection to server closed!
[15:54:46] Connecting to lobby.springrts.com ...
[15:54:46] Connection established to lobby.springrts.com
[15:55:16] Timeout assumed. Disconnecting ...
[15:55:16] Connection to server closed!
[15:58:46] Connecting to lobby.springrts.com ...
[15:58:46] Connection established to lobby.springrts.com
[15:59:16] Connection to server closed!
[16:05:06] Connecting to lobby.springrts.com ...
[16:05:07] Connection established to lobby.springrts.com
[16:05:07] Login successful!
did u fixed something?
+1 to "shift stuff away from spring server/vm"
+ thanks to dansan for the hardware support - i would support too but i think this is enough
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator, [no address given] and inform them of the time the error occurred, and anything you might have done that may have caused the error.
More information about this error may be available in the server error log.
Apache/2.2.22 (Ubuntu) Server at springrts.com Port 80
@ ~ 15:40 german time
+
[15:41:58] Connecting to lobby.springrts.com ...
[15:41:58] Connection established to lobby.springrts.com
[15:42:29] Timeout assumed. Disconnecting ...
[15:42:29] Connection to server closed!
[15:42:32] Connecting to lobby.springrts.com ...
[15:42:32] Connection established to lobby.springrts.com
[15:43:02] Timeout assumed. Disconnecting ...
[15:43:02] Connection to server closed!
[15:54:46] Connecting to lobby.springrts.com ...
[15:54:46] Connection established to lobby.springrts.com
[15:55:16] Timeout assumed. Disconnecting ...
[15:55:16] Connection to server closed!
[15:58:46] Connecting to lobby.springrts.com ...
[15:58:46] Connection established to lobby.springrts.com
[15:59:16] Connection to server closed!
[16:05:06] Connecting to lobby.springrts.com ...
[16:05:07] Connection established to lobby.springrts.com
[16:05:07] Login successful!
did u fixed something?
+1 to "shift stuff away from spring server/vm"
+ thanks to dansan for the hardware support - i would support too but i think this is enough
Re: lobby server disconnects / high load at springrts.com se
yeah, seems recent changes to server didn't improve the situation
very-very likely disks or raid controller is broken/dying.
(-> will try to migrate at least springrts.com to dansans server)
very-very likely disks or raid controller is broken/dying.
(-> will try to migrate at least springrts.com to dansans server)
Re: lobby server disconnects / high load at springrts.com se
After reading this and looking around, I have to conclude the storage could very well be dying
Re: lobby server disconnects / high load at springrts.com se
ok, plan slightly adjusted:
licho contacted / will contact ovh about the possible broken storage and we hopefully get a new machine. meanwhile i try to create some script which copies current (linux) springrts.com (website + lobbyserver + buildbot, etc) to dansans machine so it can be used as fallback.
licho contacted / will contact ovh about the possible broken storage and we hopefully get a new machine. meanwhile i try to create some script which copies current (linux) springrts.com (website + lobbyserver + buildbot, etc) to dansans machine so it can be used as fallback.
Re: lobby server disconnects / high load at springrts.com se
As a system administrator I approve this message!
Re: lobby server disconnects / high load at springrts.com se
ovh offered sth. like:
- replace disk, which won't solve the random hangs imo as the private stuff is causing a big part of it
- replace server, when something "more expensive" is ordered when i understood it right, which is no opinion, too as imo way to many money is wasted atm already.
as dansan offered a fast server for free and the migration script now exists, i/we will try this first, he runs replays.springrts.com for a long time.
conclusion: migrate springrts.com to dansans server: http://springrts.com/phpbb/viewtopic.php?f=71&t=32692
idk what to do with the windows machine, but the current server without springrts.com should be fast enough. imo get rid of it and move services to linux, but thats sth. the zero-k devs have to decide/to do.
- replace disk, which won't solve the random hangs imo as the private stuff is causing a big part of it
- replace server, when something "more expensive" is ordered when i understood it right, which is no opinion, too as imo way to many money is wasted atm already.
as dansan offered a fast server for free and the migration script now exists, i/we will try this first, he runs replays.springrts.com for a long time.
conclusion: migrate springrts.com to dansans server: http://springrts.com/phpbb/viewtopic.php?f=71&t=32692
idk what to do with the windows machine, but the current server without springrts.com should be fast enough. imo get rid of it and move services to linux, but thats sth. the zero-k devs have to decide/to do.