Super High Server Load

I’m having some troubles with a site and I can’t seem to figure out what is going on.



For some reason the server is totally getting way overloaded. Like the loading will be over 100. This is a dedicated box on ServInt and shouldn’t be having these types of issues.



When I look at the top running processes, I see a ton of:



/usr/bin/php /home/username/public_html/index.php



and:



/usr/local/apache/bin/httpd -k start -DSSL



The sever gets so overloaded that even accessing WHM or even SSH is super slow.



Anyone have any ideas?



Thanks,



Brandon

When I've experienced this type of issue in the past, it was due to a “caching proxy server” accessing the site. Generally, I've seen these coming from Government sites like Navy Intelligence (oxymoron in this case).



The behavior is that when a user accesses your site from behind one of these proxy servers, that the proxy server will try to follow every link and cache the pages to better serve their user on subsequent page accesses. However, if the store is multi-language, it grabs pages for all the languages times the number of links.



I identified the culprit by using netstat to identify the IP address(es) associated with all the processes and then blocked the C-class subnet for that IP.



If you use 'file' as the caching type, all of these processes (the page reads and all their associated ajax requests) will probably block on the file contention, especially if the cache is being re-written as things like new languages not previously accessed are switched to.



Sorry for the long winded answer, but that's what I found when I used to see a server for one of my clients who sold specialty Mazda parts spike to the 200 level. The miracle is that the server actually survived and did recover at some point.



Netstat is your friend.

Wow Brandon. Your description almost sounded like you were working on one of our sites! The difference is we are on Future Hosting and also, the high server load has gotten so bad that it shut down our server multiple times over the past week and a half. The strange part is the major problems with the server load seems to only happen between 11pm and 4am. These are times when we turn off all advertising and the traffic to the server is very low. At first we thought it was caused by the servers automatic back up. So we turned it off one night and the load still shut it down. This is a dedicated server with 8gb ram (with 2 more on a swap drive) that has ran pretty smooth for the past year+.



Here is what we see and what we've done so far:


  1. We too see a ton of the following…

    /usr/bin/php /home/newts/public_html/index.php

    …but we have not figured out why or what. I am going to try what Tony suggested and, “block the C-class subnet for that IP”. I've just got to figure out those IP addresses.


  2. Right about 10pm there seems to be more and more attempted log ins that we see with strange user names like “clammassY”, :Casque Beats Pas Cher", “carpinteyroffs” and more. So we've started blocking their IPs via the firewall, and htacess. We also ended up blocking whole nations that we do not sell to but appear to be very actively trying to log in to our site. The part that is strange then is if we block Morocco, then we start getting these attempted log ins from Korea.


  3. Around the 1-2am time there appears to be something coming to our site on a nightly basis or doing something on the site to cause our server to contact UPS and USPS about 50 times within a couple of minutes. I have not quite figured who or what this is, I just see tons of request in the CS-Cart logs.


  4. At some point when we did an upgrade or something it must have zapped the instructions to tell Google and Yahoo bots to scan the site a little slower. So we ended up adding the following text back to the robot.txt that tbirnseth suggested in another post…


Crawl-delay: 2.0
Request-rate: 30




5) We thought that an add-on for a shipping program may have caused an issue since this appeared to be some type of problems with the shipping. Our site did not crash one night, but last night, without that software installed, the server crashed again at 2 am.



ADDED 6) I forgot to mention that my host says the database is what is consuming most of the resources and causing it to use about 7gb of the 8gb of ram quite frequently. Their solution here is to add more ram or make the swap drive larger. I would rather try to figure out what the heck is causing this and then add more ram if needed. Again, the sad part is these queries seem to be in evening or overnight most of the time.



I've spent quite a few late, late nights trying to figure out what the heck is hitting us hard like this. We do see the hits in the day time, but again, not near like it is in the evening. Quite honestly, it almost feels like as soon as I log out of CS admin our server starts getting hit. Probably just a coincidence, but it is strange.

Here's a bash function that I use to look at netstat info relative to this type of problem.

With no arguements it shows the ESTABLISHED tcp connections. Otherwise the first argument will filter by some value and the 2nd will filter by a hostname (actually for a site user when suPHP is used).



Note also that many “crawlers” are somewhat malicious, looking for malware entry points and have no regard for any load they might be inducing. Hence they probably ignore any robots.txt directives for speed or access.



function net_info() {
grep_on="$1"
host="$2"
uptime
if [ -n "$grep_on" ]
then
if [ -n "$host" ]
then netstat -tp 2>/dev/null | grep $host\\\|Send-Q | grep $grep_on\\\|Send-Q
else netstat -tp 2>/dev/null | grep $grep_on\\\|Send-Q
fi
else
if [ -n "$host" ]
then netstat -tp 2>/dev/null | grep $host\\\|Send-Q | grep ESTABLISHED\\\|Send-Q | grep -v ':imap'
else netstat -tp 2>/dev/null | grep 'ESTABLISHED\|Send-Q' | grep -v ':imap'
fi
fi
}

Our host has found 1 IP that is doing as you suggested. They are the worst so we are going to block it. There are 2 more that don't seem to be as bad, but I believe our host is going to block them too. It will be interesting to see if we can make it through an evening without crashing.



I'm going to have to figure out the whole “netstat” thing as it is new to me. For now Future Hosting is helping, which is nice. The “bash function” that you listed, is this put in to the netstat or done some other way?

netstat is a great tool for determining who is connected to your site and to what service/port.

Well worth reading up on if you ever want to get a clear (or clearer) picture of what's happening on your server.



Between netstat, top, ps, mysqladmin proc, httpd status and a few others, you can get a pretty good idea what's normal and then get a pretty good idea for when things get out of whack. In fact, “mysqladmin proc” is one of my favorites since the DB is many times one of the biggest culprits. That coupled with iostat can get you to hone in on key file areas that are overloaded. Good disk partitioning can solve a lot of contention problems. But you have to have enough disks (spindles) to work with. One just doesn't cut it.



Just remember that performance tuning is trading off resources. If you don't have enough to trade, you can't buy. Having too much is a much better position to be in. It gives you more choices.

Don't have open-ended product filters!

Sorry, I'm not for sure what “open-ended” product filters means? We do have product filters that use something like “publisher” or “price range”. Are you meaning that we should always have “categories” listed in the filter? Right now since all the products on the site have a publisher and price we left it generic with no specific categories. Is this maybe what you mean?

Here are our latest attempts to try to squash the high server load:

  1. Upgraded to 12gb Ram, expanded swap drive All went well for 1 day.
  2. We had a lot of entries about “cscart_cs_js_banners”. While the JS Banner add on was installed, it was not running. So we ended up uninstalling the addon. All went well for another day.
  3. Now we ended up having over 600 “sleeping queries” which we thought may be slowing down the server. I'm not for sure if CS always keeps a lot of these “sleeping queries” but it is strange that once again they are being “activated” or doing more in the late night hours. This time at about midnight, which is just after we turn off the advertising so we have very little traffic.
  4. Now we noticed some “segmentation faults” that looked like this…


[Sun Sep 30 00:17:34 2012] [notice] child pid 2529 exit signal Segmentation fault (11)
[Sun Sep 30 00:18:16 2012] [notice] child pid 32417 exit signal Segmentation fault (11)


We have tried debugging the segfaults, but no luck. We have now enabled core dump which will be helpful in tracing this.



With the latest stable version of PHP and MySQL supported by cPanel are 5.3.17 and 5.5.x respectively. Upgrading them might help with the Segmentation faults of PHP. We know that there are lots of differences between those versions. According to the software requirements at…

CS-Cart System Requirements — CS-Cart 4.15.x documentation

Here are the versions we can use…

--------------------

PHP version 5.1 to 5.3 (PHP 5.4 is not yet supported);

MySQL version 4.1 or greater.

--------------------

So now we know some functions of 5.2.x series are not available in 5.3.x series. So I am researching to make sure CS 2.2.5 will run fine on 2.2.5 before we take that dive.



Brandon, have you had any luck on your trouble shooting? We still have not seemed to have been able to solve our issue. I would love to hear what else you have tried or if you have solved the issue.

[quote name='clips' timestamp='1348582687' post='145713']

Sorry, I'm not for sure what “open-ended” product filters means? We do have product filters that use something like “publisher” or “price range”. Are you meaning that we should always have “categories” listed in the filter? Right now since all the products on the site have a publisher and price we left it generic with no specific categories. Is this maybe what you mean?

[/quote]



The client in question had approximately 30-50 filters set to 'Homepage' and 'All Categories' with complex breakdowns.

For instance there were waterproof features attributed to products that were irrelevant to the filter/feature itself.

If you have modified filters and visit every page in the store it can create a compounding effect against products in different categories.




[quote name='clips' timestamp='1349024399' post='146043']

Here are our latest attempts to try to squash the high server load:

  1. Upgraded to 12gb Ram, expanded swap drive All went well for 1 day.
  2. We had a lot of entries about “cscart_cs_js_banners”. While the JS Banner add on was installed, it was not running. So we ended up uninstalling the addon. All went well for another day.
  3. Now we ended up having over 600 “sleeping queries” which we thought may be slowing down the server. I'm not for sure if CS always keeps a lot of these “sleeping queries” but it is strange that once again they are being “activated” or doing more in the late night hours. This time at about midnight, which is just after we turn off the advertising so we have very little traffic.
  4. Now we noticed some “segmentation faults” that looked like this…


[Sun Sep 30 00:17:34 2012] [notice] child pid 2529 exit signal Segmentation fault (11)
[Sun Sep 30 00:18:16 2012] [notice] child pid 32417 exit signal Segmentation fault (11)


We have tried debugging the segfaults, but no luck. We have now enabled core dump which will be helpful in tracing this.



With the latest stable version of PHP and MySQL supported by cPanel are 5.3.17 and 5.5.x respectively. Upgrading them might help with the Segmentation faults of PHP. We know that there are lots of differences between those versions. According to the software requirements at…

https://www.cs-cart…quirements.html

Here are the versions we can use…

--------------------

PHP version 5.1 to 5.3 (PHP 5.4 is not yet supported);

MySQL version 4.1 or greater.

--------------------

So now we know some functions of 5.2.x series are not available in 5.3.x series. So I am researching to make sure CS 2.2.5 will run fine on 2.2.5 before we take that dive.



Brandon, have you had any luck on your trouble shooting? We still have not seemed to have been able to solve our issue. I would love to hear what else you have tried or if you have solved the issue.

[/quote]



I resolved the issue after quite a few hours, this situation is probably unique to Brandon's client however,



When upgrading between version of CS-Cart, especially those whom started with very old databases it's imperative to complete a 'structure comparison' between MySQL tables.



In this case the client had failed addon installations, additional table indexes, columns and keys. Removing all of this junk reduced the overhead for CS-Cart to run queries that were resulting in a error (at mySQL's end) and not being reported back to CS-Cart - Consequently resulting in a looping script as far as php/mysql was concerned.



If you would like me to take a quick look clips, send me an email or Skype.

I should mention that Brandon upgraded the client to 3.0.3, however the issue reoccured in 3.0.3 prior to my database repair/fix.



Regards,

J.

Jim, by the symptoms you describe, it sounds like more of a mySQL configuration issue. I.e. holding lost of “persistent” connections (sleeping connections in mysqladmin speak). Something you might try is changing from mysqli to mysql (or reverse if you have it set to mysql). I've found the fewest issues with PHP 5.2.x and mySQL 5.5.x. But of course, your mileage may vary.

Jim,



Jesse really saved my bacon and was able to get the site up and running. He showed me what was wrong and went over stuff with me so that I knew what the problem was. Of course Jesse went a bit fast, but I'll have to hit him up again to go over things a bit slower.



I'd really recommend contacting Jesse to go over things on your site. It did take a while to get everything figured out, but now the site works perfectly.



Thanks,



Brandon

I had the same problem on my site i fixd it by configuring the VPS server.



My site dos not have many users at one time so i playd around with the apache conf file a bit and changed some settings to this.



apache2.conf:



Timeout 300

KeepAlive On

MaxKeepAliveRequests 300

KeepAliveTimeout 5



StartServers 2

MinSpareServers 2

MaxSpareServers 5

MaxClients 10

MaxRequestsPerChild 600



Mysql via webmin

Query cache size in bytes = 102400 kb

Maximum packet size = 16 mb



depending on your server you can try to ply a bit with thees numbers and read up on how and what thay do.

It might sound too futuristic but I would recommend setting up Splunk to analyze your apache logs. It is extremely helpful to detect robots/individuals who are abusing httpd or mirroring your website/crawlers etc.