I am seriously thinking about blocking MSNBot. Their search engine is pretty much worthless and the results are not accurate. I looked in my control panel and saw that they visited me over 300 times today. My VPS is moving pretty slow as it is and I don’t need more slow downs!.
Yes, actually I block “selectively”:
Googlebot
Yahoo-Slurp
Msnbot
I have found that the bandwidth they consistently drain from my websites is not worth the “indexing” that they do…I have not noticed any detriment to my ranking on these sites because of this.
Although some here might disagree, I’d say go ahead and block away.
Another thing you might consider is the blocking the “nasty” bots via your .htaccess file…but that’s a whole nother subject
WebGuy
Why block them? Just use an XML based sitemap that shows the update frequency as monthly or yearly. Google’s webmaster account allows you to specify the crawl rate for your site.
Also, why do not want to be listed in a search engine? Even if MSN is 3rd, it is 3rd out of 1000’s of search services and a customer’s order from MSN is as good as a customer’s order from Google.
[QUOTE]Why block them? Just use an XML based sitemap that shows the update frequency as monthly or yearly. Google’s webmaster account allows you to specify the crawl rate for your site.[/QUOTE]
There is more than just 1 bot that will crawl your site, if you let them.
If you have AWStats check your traffic and bandwidth use from “Robots/Spiders visitors”. You might be quite surprised.
The XML sitemap just provides a simple path for googlebot to follow and post info for their search.
You don’t need/want them crawling specific areas of your website so you can get more specific with a robots.txt file than with the sitemap.
[QUOTE]Google uses several user-agents. You can block access to any of them by including the bot name on the User-agent line of an entry. Blocking Googlebot blocks all bots that begin with “Googlebot”.
Googlebot: crawl pages from our web index and our news index
Googlebot-Mobile: crawls pages for our mobile index
Googlebot-Image: crawls pages for our image index
Mediapartners-Google: crawls pages to determine AdSense content. We only use this bot to crawl your site if AdSense ads are displayed on your site.
Adsbot-Google: crawls pages to measure AdWords landing page quality. We only use this bot if you use Google AdWords to advertise your site. [/QUOTE]
[QUOTE]Also, why do not want to be listed in a search engine? [/QUOTE]
There is a difference between being listed on a search engine and having bots pound your website with their crawls.
There are times when they can potentially slow your site down to a “crawl”.
I have NOT noticed any detriment to my ranking on these sites because of the blocking I have in place.
I use both the robots.txt file and .htaccess to block 100’s of bots.
They are just out there gathering information and have no significant benefit to you at all.
[SIZE=“1”]Disclaimer:[/SIZE]
There are a myriad of different opinions concerning this and I’m sure there are some who would agree and disagree with me.
This is what I have found works for my websites to help control both bots and bandwidth.