Jump to content

  • You cannot start a new topic
  • You cannot reply to this topic

Good Vs Bad Bots Rate Topic   - - - - -

 

Posted 21 June 2018 - 01:16 PM #1

Hi everyone!

 

We monitor our servers performance every day. And every single day we see a huge pile or requests made by crawling bots.

 

Some of them are search engines, but some just come to get the information for themselves. Sometimes they become too greedy and we have to block them. Though they do no har deliberately, they might eat resources and bandwidth resulting in performance sagging in the end.

 

The illustration is below. Not too impressive, but that is on an average day.

 

image.png

 

We've made a list of 5 Bad Bots that waste and 5 Good Bots that add value to your website.

 

#1 Bad bot in our practice appeared to be MJ12Bot by Majestic.

#1 Good bot is Googlebot by Google (no surprise here)

 

Read more about bots on our blog.

 

Why don't we create an extended list of Bad bots and put them into a single file for the sake of CS-Cart users? I know that there are millions of them, but elt's share those you meet in your practice.


AWS Cloud hosting for CS-Cart and Multi-Vendor

by Simtech Development - CS-Cart certified hosting provider

free installation & migration | free 24/7 server monitoring | free daily backups | free SSL | and more...


 

Posted 21 June 2018 - 01:18 PM #2

We'll start with Bad bots:

 

User-agents:

-MJ12Bot

-AhrefsBot

-SEMrushBot

-DotBot

-MauiBot

 

list five of yours


AWS Cloud hosting for CS-Cart and Multi-Vendor

by Simtech Development - CS-Cart certified hosting provider

free installation & migration | free 24/7 server monitoring | free daily backups | free SSL | and more...


 
  • P-Pharma
  • Junior Member
  • Members
  • Join Date: 30-Jun 10
  • 1119 posts

Posted 22 June 2018 - 12:39 PM #3

This recent list has 1200 bad bots that you can block through htaccess:

http://tab-studio.co...s-on-your-page/

 

We are getting a lot of bot registrations specifically targetting CS-Cart because it has no protection. This seems to be done through Xrumer spammer software which changes the user agent and can therefore not be blocked by user agent.



 

Posted 26 June 2018 - 01:46 PM #4

Looks like recaptcha can cope with it. Blocking all of thousands and thousands of existing bots feels like shooting a fly with a shotgun. We would concentrate on the most "greedy".


AWS Cloud hosting for CS-Cart and Multi-Vendor

by Simtech Development - CS-Cart certified hosting provider

free installation & migration | free 24/7 server monitoring | free daily backups | free SSL | and more...


 
  • P-Pharma
  • Junior Member
  • Members
  • Join Date: 30-Jun 10
  • 1119 posts

Posted 26 June 2018 - 03:58 PM #5

Recaptcha is completely broken by Xrumer.



 
  • dbazhenov
  • Senior Member
  • Authorized Reseller
  • Join Date: 15-May 12
  • 6446 posts

Posted 27 June 2018 - 01:58 PM #6

Hello,
 
It looks like you are promoting this software. Possibly the next step will be a referral link? xD
 
As you can see, we talk here not about spam or ways to bypass a captcha. We are speaking about malicious load and that some robots who scan the internet can make your website unavailable.
 

 

Also, I'd like to note that there is invisible ReCaptcha v3 and also custom ways like honeypots, you know. But this is a different story not for this forum topic.

Special cloud hosting for CS-Cart and Multi-Vendor. Just email me cloud@simtechdev.com

 
  • P-Pharma
  • Junior Member
  • Members
  • Join Date: 30-Jun 10
  • 1119 posts

Posted 27 June 2018 - 03:25 PM #7

I assure you that I in no way condone this software, but I do want webmasters aware that there are very advanced malicious tools that can be used against CS-Cart and CS-Cart unfortunately has almost no protection.For us this is a major problem which makes it difficult to use CS-Cart. I have opened several threads on related topics. We have just turned all commenting and reviews off because we cannot protect our sites against it.

 

It looks like you are promoting this software. Possibly the next step will be a referral link? xD
 
As you can see, we talk here not about spam or ways to bypass a captcha. We are speaking about malicious load and that some robots who scan the internet can make your website unavailable.

 

I posted a link above that shows how to block 1200 such bots through htaccess.

We are experiencing a high load from several types of bots:

 

1. unwanted crawler bots

2. content scrapers

3. spam bots

4. vulnerability scanner bots.

 

Spam bots and vulnerability scanners often cloak the user agent. Scanners actively search for cs-cart installations.

I posted a thread about an SQL injection method that I found after such bots used it. I found it in my logs. CS-Cart staff is aware of it.

 

Crawler bots are especially causing issues with crawling features and filters and thereby causing millions of cache files. Our hosting costs have skyrocketed because of it. Its also reducing our site speed significantly.

 

 

Also, I'd like to note that there is invisible ReCaptcha v3 and also custom ways like honeypots, you know. But this is a different story not for this forum topic.

 

 

Actually ReCaptcha and honeypots can be used to stop malicious bots from crawling the site. Its not just for registering an account. 

But there is no functionality for this.

Could you be so kind to let me know of any honeypot functionality for cs-cart?

 

It would be nice if CS-Cart would have bad bot protection similar to:

https://bad-behavior.ioerror.us/(stops bots by analysis & fingerprinting)

https://wordpress.or...ns/stopbadbots/

https://swissuplabs....protection.html

https://www.extendwa...ot-blocker.html

https://wordpress.or...-pot-spam-trap/

 

As you see many such applications are not just for spam bots but for all kinds of bots.



 
  • P-Pharma
  • Junior Member
  • Members
  • Join Date: 30-Jun 10
  • 1119 posts

Posted 01 July 2018 - 12:10 PM #8

After analysis of our statistics and most hits & bandwidth by user agents, I have added several malicious user agents to the above block list:

megaindex.ru

dotbot

mauibot

 

@maksim It seems that we are dealing with mostly the same bots as you are.

 

baidupider

After blocking baidu, baiduspider took over. We get a lot of supply offers through baidu so it hurts us to block baidu. But CS-Cart its cache file generation in combination with such active spider heavily hurts the site performance.

 

I also found one particularly malicious custom targetting CS-Cart from UA and RU, which is also blocked by the above httaccess blocklist.



 

Posted 03 July 2018 - 07:13 AM #9

P-Pharma

 

Thanks for sharing!


AWS Cloud hosting for CS-Cart and Multi-Vendor

by Simtech Development - CS-Cart certified hosting provider

free installation & migration | free 24/7 server monitoring | free daily backups | free SSL | and more...