Jump to content

  • You cannot start a new topic
  • You cannot reply to this topic

Good Vs Bad Bots Rate Topic   - - - - -

 

Posted 21 June 2018 - 01:16 PM #1

Hi everyone!

 

We monitor our servers performance every day. And every single day we see a huge pile or requests made by crawling bots.

 

Some of them are search engines, but some just come to get the information for themselves. Sometimes they become too greedy and we have to block them. Though they do no har deliberately, they might eat resources and bandwidth resulting in performance sagging in the end.

 

The illustration is below. Not too impressive, but that is on an average day.

 

image.png

 

We've made a list of 5 Bad Bots that waste and 5 Good Bots that add value to your website.

 

#1 Bad bot in our practice appeared to be MJ12Bot by Majestic.

#1 Good bot is Googlebot by Google (no surprise here)

 

Read more about bots on our blog.

 

Why don't we create an extended list of Bad bots and put them into a single file for the sake of CS-Cart users? I know that there are millions of them, but elt's share those you meet in your practice.


AWS Cloud hosting for CS-Cart and Multi-Vendor

by Simtech Development - CS-Cart certified hosting provider

free installation & migration | free 24/7 server monitoring | free daily backups | free SSL | and more...


 

Posted 21 June 2018 - 01:18 PM #2

We'll start with Bad bots:

 

User-agents:

-MJ12Bot

-AhrefsBot

-SEMrushBot

-DotBot

-MauiBot

 

list five of yours


AWS Cloud hosting for CS-Cart and Multi-Vendor

by Simtech Development - CS-Cart certified hosting provider

free installation & migration | free 24/7 server monitoring | free daily backups | free SSL | and more...


 
  • P-Pharma
  • Junior Member
  • Members
  • Join Date: 30-Jun 10
  • 1130 posts

Posted 22 June 2018 - 12:39 PM #3

This recent list has 1200 bad bots that you can block through htaccess:

http://tab-studio.co...s-on-your-page/

 

We are getting a lot of bot registrations specifically targetting CS-Cart because it has no protection. This seems to be done through Xrumer spammer software which changes the user agent and can therefore not be blocked by user agent.



 

Posted 26 June 2018 - 01:46 PM #4

Looks like recaptcha can cope with it. Blocking all of thousands and thousands of existing bots feels like shooting a fly with a shotgun. We would concentrate on the most "greedy".


AWS Cloud hosting for CS-Cart and Multi-Vendor

by Simtech Development - CS-Cart certified hosting provider

free installation & migration | free 24/7 server monitoring | free daily backups | free SSL | and more...


 
  • P-Pharma
  • Junior Member
  • Members
  • Join Date: 30-Jun 10
  • 1130 posts

Posted 26 June 2018 - 03:58 PM #5

Recaptcha is completely broken by Xrumer.



 
  • dbazhenov
  • Senior Member
  • Authorized Reseller
  • Join Date: 15-May 12
  • 6446 posts

Posted 27 June 2018 - 01:58 PM #6

Hello,
 
It looks like you are promoting this software. Possibly the next step will be a referral link? xD
 
As you can see, we talk here not about spam or ways to bypass a captcha. We are speaking about malicious load and that some robots who scan the internet can make your website unavailable.
 

 

Also, I'd like to note that there is invisible ReCaptcha v3 and also custom ways like honeypots, you know. But this is a different story not for this forum topic.


 
  • P-Pharma
  • Junior Member
  • Members
  • Join Date: 30-Jun 10
  • 1130 posts

Posted 27 June 2018 - 03:25 PM #7

I assure you that I in no way condone this software, but I do want webmasters aware that there are very advanced malicious tools that can be used against CS-Cart and CS-Cart unfortunately has almost no protection.For us this is a major problem which makes it difficult to use CS-Cart. I have opened several threads on related topics. We have just turned all commenting and reviews off because we cannot protect our sites against it.

 

It looks like you are promoting this software. Possibly the next step will be a referral link? xD
 
As you can see, we talk here not about spam or ways to bypass a captcha. We are speaking about malicious load and that some robots who scan the internet can make your website unavailable.

 

I posted a link above that shows how to block 1200 such bots through htaccess.

We are experiencing a high load from several types of bots:

 

1. unwanted crawler bots

2. content scrapers

3. spam bots

4. vulnerability scanner bots.

 

Spam bots and vulnerability scanners often cloak the user agent. Scanners actively search for cs-cart installations.

I posted a thread about an SQL injection method that I found after such bots used it. I found it in my logs. CS-Cart staff is aware of it.

 

Crawler bots are especially causing issues with crawling features and filters and thereby causing millions of cache files. Our hosting costs have skyrocketed because of it. Its also reducing our site speed significantly.

 

 

Also, I'd like to note that there is invisible ReCaptcha v3 and also custom ways like honeypots, you know. But this is a different story not for this forum topic.

 

 

Actually ReCaptcha and honeypots can be used to stop malicious bots from crawling the site. Its not just for registering an account. 

But there is no functionality for this.

Could you be so kind to let me know of any honeypot functionality for cs-cart?

 

It would be nice if CS-Cart would have bad bot protection similar to:

https://bad-behavior.ioerror.us/(stops bots by analysis & fingerprinting)

https://wordpress.or...ns/stopbadbots/

https://swissuplabs....protection.html

https://www.extendwa...ot-blocker.html

https://wordpress.or...-pot-spam-trap/

 

As you see many such applications are not just for spam bots but for all kinds of bots.



 
  • P-Pharma
  • Junior Member
  • Members
  • Join Date: 30-Jun 10
  • 1130 posts

Posted 01 July 2018 - 12:10 PM #8

After analysis of our statistics and most hits & bandwidth by user agents, I have added several malicious user agents to the above block list:

megaindex.ru

dotbot

mauibot

 

@maksim It seems that we are dealing with mostly the same bots as you are.

 

baidupider

After blocking baidu, baiduspider took over. We get a lot of supply offers through baidu so it hurts us to block baidu. But CS-Cart its cache file generation in combination with such active spider heavily hurts the site performance.

 

I also found one particularly malicious custom targetting CS-Cart from UA and RU, which is also blocked by the above httaccess blocklist.



 

Posted 03 July 2018 - 07:13 AM #9

P-Pharma

 

Thanks for sharing!


AWS Cloud hosting for CS-Cart and Multi-Vendor

by Simtech Development - CS-Cart certified hosting provider

free installation & migration | free 24/7 server monitoring | free daily backups | free SSL | and more...


 
  • deadroot
  • Newbie
  • Members
  • Join Date: 13-Jun 18
  • 8 posts

Posted 18 October 2018 - 07:49 PM #10

Anyway, I can recommend using "Crawl-Delay" directive in the robots.txt file.
 
Default robots.txt with Crawl-Delay improvement will look like
User-agent: *
Disallow: /app/
Disallow: /store_closed.html
Crawl-Delay: 1

User-Agent: MJ12bot
Crawl-Delay: 5


 
  • P-Pharma
  • Junior Member
  • Members
  • Join Date: 30-Jun 10
  • 1130 posts

Posted 21 October 2018 - 01:51 AM #11

Bad bots simply ignore that.



 
  • mokeshop
  • Senior Member
  • Members
  • Join Date: 27-Jul 12
  • 978 posts

Posted 21 October 2018 - 01:31 PM #12

Bad bots simply ignore that.

 

evil bots :)



 
  • deadroot
  • Newbie
  • Members
  • Join Date: 13-Jun 18
  • 8 posts

Posted 21 October 2018 - 07:23 PM #13

Bad bots simply ignore that.

 

Under the "bad bots," I don't mean crawlers/scanners/etc., which should be banned by different methods. If you want, I can write several ways.
 
Check any store from the Shopify https://shopify.com/examples. They have a similar "bad bots" list
...
​
User-agent: Nutch
Disallow: /

User-agent: MJ12bot
Crawl-Delay: 10

User-agent: Pinterest
Crawl-delay: 1


 
  • P-Pharma
  • Junior Member
  • Members
  • Join Date: 30-Jun 10
  • 1130 posts

Posted 23 October 2018 - 09:20 PM #14

Yes, but this is not shopify. CS-Cart does not have bad bot protection.

MJ12bot ignores robots.txt You need to block it from your server.



 
  • deadroot
  • Newbie
  • Members
  • Join Date: 13-Jun 18
  • 8 posts

Posted 26 October 2018 - 01:25 PM #15

Yes, but this is not shopify. CS-Cart does not have bad bot protection.

MJ12bot ignores robots.txt You need to block it from your server.

 

:)

if ($http_user_agent ~* (MJ12bot|...) ) {
    return 403;
}
But blocking by user agent(light-changeable) list is bullshit.
Here need a different approach like, which use web application firewalls (WAF) with analyzing of IP (who/from where), user behavior, type of requests, etc. Or using WAF like https://wallarm.com/ or https://aws.amazon.com/waf/, etc xD


 
  • P-Pharma
  • Junior Member
  • Members
  • Join Date: 30-Jun 10
  • 1130 posts

Posted 26 October 2018 - 11:30 PM #16

Yes, WAF support would be very nice to have. But it needs CSC integration or you will block valid customers.

Until then the only thing we have is blocking on server firewall level.



 
  • soft-solid
  • Junior Member
  • Members
  • Join Date: 19-Apr 10
  • 489 posts

Posted 01 November 2018 - 08:58 PM #17

Hello

Maybe this addon will be usefull in this problem with network robots.

 

https://marketplace....ork-robots.html

 

Best regards

Robert.


Team of SoftSolid
cs-cart.pl

 

Posted 03 November 2018 - 09:34 AM #18

Good bots" as well as "bad bots" keep on resulting in increasing Web-traffic like never earlier; however, the second type of bots are getting more-and-more prominent.

So claims Distil Networks in its 2018 Bad Bot Report released in the current week. Among innumerable requests related to bad bots exist, possible malicious activities which fraudsters, hackers along with competitors control. Bots are as well utilized for carrying out brute-force assaults, rival data mining, account hijacks, downtime, digital ad scams, data theft and online scam.