Jump to content

  • You cannot start a new topic
  • You cannot reply to this topic

Possible fix for CPU usage issues Rate Topic   - - - - -

 
  • tdonj
  • Junior Member
  • Members
  • Join Date: 23-Apr 07
  • 19 posts

Posted 16 May 2013 - 09:08 PM #1

My server has been experiencing intermittent CPU-related performance issues for a while now, and I (may have) finally found the problem and solution. (My site runs version 2.2.4 on a VPS with a 3.4 GHz CPU and 2.3 GB of RAM.)

THE PROBLEM: After identifying periods of excessive load (100% CPU usage) and reviewing access logs for those times, a pattern emerged. During each of those times, robots (crawlers) were indexing URLs containing "features_hash" (due to use of CS-Cart's "Filters" feature).

As many of you know, having several Filters, each with with several options, produces a very large number of possible URL combinations. Even with caching turned on, these "features_hash" pages seem to be more processor-intensive than standard category and product URLs. (Maybe someone could test this theory and do some benchmarking?) Even if they're not more processor-intensive, the potential volume of pages is huge, and the value of having them indexed is (for me at least), very small.

THE SOLUTION: Change the robots.txt file to include the following:
User-agent: *
Disallow: /*?

or

User-agent: *
Disallow: /*features_hash

The first option disallows indexing of all URLs which contain a question mark. Because my site has SEO-friendly links, I've chosen that option, as I'm happy to avoid indexing all non-SEO-friendly URLs. The second option is more specific, and only disallows indexing of "features_hash" pages.

For my site, the change was immediate. I understand that not all robots follow the robots.txt protocol, but most of my robot traffic now avoids the CPU-intensive indexing of unwanted pages.

I would appreciate any confirmation (or refutation) of this theory and fix.

Important note on the robots.txt file: be sure to modify the robots.txt file which resides in your site's root web directory (www.yoursite.com/robots.txt). I've seen some posters state (incorrectly) that you can install CS-Cart in a subdirectory (www.yoursite.com/yourfolder) and still expect the standard robots.txt installed by CS-Cart to be found and used. If you doubt this, search your server logs for /yourfolder/robots.txt


cheers,
Don

 
  • The Tool
  • Been Here Way Too Long Member
  • Members
  • Join Date: 30-Mar 07
  • 3783 posts

Posted 17 May 2013 - 12:42 AM #2

Nothing too new really. I've been stating for years now how to cut down on unnecessary crawling/indexing by adding to robots.txt but apparently most do not heed and/or read.

I've seen some posters state (incorrectly) that you can install CS-Cart in a subdirectory (www.yoursite.com/yourfolder) and still expect the standard robots.txt installed by CS-Cart to be found and used. If you doubt this, search your server logs for /yourfolder/robots.txt


This is only true if you just create a subdirectory. However, if you create a subdomain, you can utilize a robots.txt file for each subdomain/subdirectory.