Jump to content

  • You cannot start a new topic
  • You cannot reply to this topic

Cleantalk Spam Protection Rate Topic   - - - - -

 
  • remoteone
  • Member
  • Members
  • Join Date: 06-Oct 09
  • 754 posts

Posted 10 August 2019 - 03:44 AM #1

I've just added CleanTalk spam protection to a CSC v2 website. Seems to work well so far,  so I thought it would be good to have a discussion topic for sharing user experience .
Its a paid service, but really cheap considering the time it saves.

Apart from the automated spam filtering,  one of the fantastic advantages is that other verification methods can be turned off. With many internet users now using VPN's the Google reCapture more often requires full "click all the pictures with cars" validation for known VPN IP addresses,  which is just another buyer objection to avoid if possible.

 

Anyone had experiences, any disadvantages of using this service?



 
  • kogi
  • Senior Member
  • Members
  • Join Date: 16-Aug 07
  • 620 posts

Posted 10 August 2019 - 05:46 AM #2

Looks good. recapture is getting pretty annoying.

 

How do you install it for cscart? Is it an addon?


find / -type f -name '*.base' -exec chown kogi.kogi {} \;

 
  • remoteone
  • Member
  • Members
  • Join Date: 06-Oct 09
  • 754 posts

Posted 10 August 2019 - 06:17 AM #3

Update,  !!!
CleanTalk has stuffed our website,  so I dont recommend installing just yet.

Still trying to figure out the issue.
If you do try it,  check website operation  thoughourly before purchasing CleanTalk!



 
  • remoteone
  • Member
  • Members
  • Join Date: 06-Oct 09
  • 754 posts

Posted 10 August 2019 - 07:40 AM #4

Ive now disabled CleanTalk until a fix is found.

 

To disable the plugin please delete strings after //Cleantalk on the start and on the end of every index.php file on your site.

Disappointing that it broke our website,  given what it promises to achieve.
Its not an addon,  but needs to be uploaded to the website root folder ,  then run the install script. There are plug-ins for other carts so it would be a good project for an addon dev I think.
I have a support ticket open with CleanTalk.. so will report back on the outcome.



 
  • poppedweb
  • Authorized Reseller
  • Members
  • Join Date: 02-Aug 16
  • 553 posts

Posted 10 August 2019 - 09:25 AM #5

The best way to mitigate spam is to just accept that its there and design your application in such a way that the caused load doesnt matter.

 

What we found to be the biggest burden is price monitoring companies. They crawl your website every 5 minutes or so to see if your price changes. Even though it is illegal, they maintain to do their unnoticed practices. The best way to check for this is to look for recurring IP addresses at certain intervals. 

 

But take my word for it, spam is by far not your biggest concern.


PoppedWeb | sales@poppedweb.com | https://poppedweb.com
TurnKey Website Design | Add-Ons | Performance Audits | Dedicated Server Management
24/7 Support | Response within an hour (during working hours).

 
  • remoteone
  • Member
  • Members
  • Join Date: 06-Oct 09
  • 754 posts

Posted 12 August 2019 - 02:19 AM #6

@poppedweb
I very much disagree with your first comment as I feel you've overlooked the benefit of being able to disable all captcha.

 

Regarding the burden of price monitoring crawlers:
I had not considered this.. which raises the question as to whether there is an CSC addon to detect and block these?

On a VPS,  I wonder if fail2ban or similar could be configured to detect these,  but for csc on a shared server,  any code would need to be under the CSC folder. Any suggestions?



 
  • kogi
  • Senior Member
  • Members
  • Join Date: 16-Aug 07
  • 620 posts

Posted 12 August 2019 - 02:23 AM #7

But take my word for it, spam is by far not your biggest concern.

 

 

It is my biggest annoyance. If captcha is not turned on for contact  us forms, a lot of bot spam comes through


find / -type f -name '*.base' -exec chown kogi.kogi {} \;

 
  • Flow
  • Super Duper and Amazingly Sexy Senior
  • Members
  • Join Date: 13-Oct 10
  • 2307 posts

Posted 12 August 2019 - 05:15 AM #8

The best way to mitigate spam is to just accept that its there and design your application in such a way that the caused load doesnt matter.

 

 

 

It's really quite annoying if you have to delete 30 spam reviews per day (we have about 6, with google recaptcha turned on - if I turn it off this jumps to 100s).


When life hands you lemons, bring on the Tequila baby!


 
  • poppedweb
  • Authorized Reseller
  • Members
  • Join Date: 02-Aug 16
  • 553 posts

Posted 12 August 2019 - 06:25 AM #9

It's really quite annoying if you have to delete 30 spam reviews per day (we have about 6, with google recaptcha turned on - if I turn it off this jumps to 100s).

 

There is a multitude of strategies you can deploy to mitigate such attacks, the simplest one being a simple e-mail check. Another one is implementing it so that only registered users can write a review and only about the products they bought. Unregistered users could receive an email with a token that corresponds to their order to also write a review.

 

Then there is the classical IP range detector which groups reviews so that if spam is detected, you can easily delete it.

 

There are many smart ways to make handling with spam much easier. Most of the time these require some sort of investment, but you should never hinder your customer.


PoppedWeb | sales@poppedweb.com | https://poppedweb.com
TurnKey Website Design | Add-Ons | Performance Audits | Dedicated Server Management
24/7 Support | Response within an hour (during working hours).

 
  • poppedweb
  • Authorized Reseller
  • Members
  • Join Date: 02-Aug 16
  • 553 posts

Posted 12 August 2019 - 06:27 AM #10

@poppedweb
I very much disagree with your first comment as I feel you've overlooked the benefit of being able to disable all captcha.

 

Regarding the burden of price monitoring crawlers:
I had not considered this.. which raises the question as to whether there is an CSC addon to detect and block these?

On a VPS,  I wonder if fail2ban or similar could be configured to detect these,  but for csc on a shared server,  any code would need to be under the CSC folder. Any suggestions?

 

Such addons should not be required. The simplest thing I found to be working was disabling common tools used for downloading webpages to parse them. These include: python, wget and curl.

 

Besides, such practices are illegal. If you update your terms of service and do find a company crawling your website, you can easily sue them and get a reimbursement for all damages done.


PoppedWeb | sales@poppedweb.com | https://poppedweb.com
TurnKey Website Design | Add-Ons | Performance Audits | Dedicated Server Management
24/7 Support | Response within an hour (during working hours).

 
  • remoteone
  • Member
  • Members
  • Join Date: 06-Oct 09
  • 754 posts

Posted 12 August 2019 - 01:52 PM #11

@poppedweb

Well .. Neither python nor wget are enabled on this particular server and I'm sure that curl is needed.
Extract from CSC documentation:

Server Configuration Requirements:
"cURL support should be enabled. You need this PHP extension to ensure support of secure connections, some payment systems such as PayPal and Authorize.Net, and real-time shipping rate calculators for FedEx and DHL/Airborne."

 

Whilst alternative suggestions to CleanTalk are welcome, IMHO suggesting that the way to deal with companies crawling your website by law suet is off-topic, not realistic, at best left for another thread.

 

The aim of CleanTalk is to prevent spam in first place, so again, suggesting we should just delete spam as it comes in is also unhelpful. This thead is about "CleanTalk Spam Protection" not "Spam Reviewing and Deletion"...
 



 
  • poppedweb
  • Authorized Reseller
  • Members
  • Join Date: 02-Aug 16
  • 553 posts

Posted 12 August 2019 - 01:57 PM #12

@poppedweb

Well .. Neither python nor wget are enabled on this particular server and I'm sure that curl is needed.
Extract from CSC documentation:

 

Also, whilst alternative suggestions to CleanTalk are welcome, IMHO suggesting that the way to deal with companies crawling your website by law suet is off-topic, not realistic, at best left for another thread.

 

 

You completely misunderstood me. You can disable these tools by banning user agents (as they all have a specific user agent). This way other people can not use such tools to extract info from your website.


PoppedWeb | sales@poppedweb.com | https://poppedweb.com
TurnKey Website Design | Add-Ons | Performance Audits | Dedicated Server Management
24/7 Support | Response within an hour (during working hours).

 
  • remoteone
  • Member
  • Members
  • Join Date: 06-Oct 09
  • 754 posts

Posted 12 August 2019 - 03:55 PM #13

Thanks for the clarification,  I'll need to research banning user agents. The purpose of using something like Cleartalk is to have things done automatically,  blocking spam based on known IPaddress. I don't think it deals with User agents. It looks like the .htacces file needs editing to ban user agents, but I'm guessing that illegal site crawlers would change frequently?
Who/what determines the user agent name a crawler will use, I mean, for example, what prevents an illegal site crawler from pretending to be a Mozilla browser ?

I guess that's why you suggest detection based on behavior ie  look for recurring IP addresses at certain intervals.

Thus some sort of Automated process is needed.
 



 
  • remoteone
  • Member
  • Members
  • Join Date: 06-Oct 09
  • 754 posts

Posted 12 August 2019 - 05:28 PM #14

Perhaps a good place to start: https://github.com/m...od_setenvif.txt



 
  • poppedweb
  • Authorized Reseller
  • Members
  • Join Date: 02-Aug 16
  • 553 posts

Posted 12 August 2019 - 05:30 PM #15

Thanks for the clarification,  I'll need to research banning user agents. The purpose of using something like Cleartalk is to have things done automatically,  blocking spam based on known IPaddress. I don't think it deals with User agents. It looks like the .htacces file needs editing to ban user agents, but I'm guessing that illegal site crawlers would change frequently?
Who/what determines the user agent name a crawler will use, I mean, for example, what prevents an illegal site crawler from pretending to be a Mozilla browser ?

I guess that's why you suggest detection based on behavior ie  look for recurring IP addresses at certain intervals.

Thus some sort of Automated process is needed.
 

 

"I guess that's why you suggest detection based on behavior", we mainly use a machine model from SageMaker for this. It compares average traffic with bot traffic and flags it accordingly. It will then apply a hard rate limit but will still allow scraping in the rare event that we flag a customer. We simply take the hit as we automatically scale our application anyways (using kubernetes).

Regarding the user agents, the key behind the fact why people can get banned using the default user agent, is that any request made through Python, gets a python user agent by default. Same applies for curl and most of the other open source alternatives. This should already warn companies that they should not index your domain as they will get an error (which will certainly puzzle their developers).

 

The next mitigation would be to check your logs every now and then for recurring IP addresses at set intervals. If these do have a separate user agent, you can try sending them a message (using reverse IP lookup you get their domains most of the times). 

 

If all does not work you can just go nuclear and ban their IP ranges and even send them legal warnings that they should not index your domain as it is against your terms of service. But do make sure that you have some kind of 'Fair Use' policy, as you are otherwise left to dust: https://resources.di...-the-word-is-is


PoppedWeb | sales@poppedweb.com | https://poppedweb.com
TurnKey Website Design | Add-Ons | Performance Audits | Dedicated Server Management
24/7 Support | Response within an hour (during working hours).