Google Index Fallen Off The Cliff

Ive a question about possibly why my site's Google indexed page count has literally fallen off the cliff since the site was upgraded from V3.06 to V4.7.2.

Prior to the upgrade my site was well in seo terms however in February of this year my site was upgrade by a third party and since the upgrade the traffic has fallen off to virtually none existing along with Google index status going from well over 850 pages to 121, around about the same time as the upgrade i added a HTTPS certificate to the url.

Trying to sort the massive drop in traffic i decided to regenerated the sitemap via Add-ons - Google Sitemap - using the regenerate button, then submitted the latest sitemap to Google in my Google's search console. The error message "URL Restricted by robots.txt" is showing for the latest site map re submit.

Checked the public.html/robot.txt file which has the following, is this correct?

User-agent: *
Disallow: /images/thumbnails/
Disallow: /skins/
Disallow: /payments/
Disallow: /store_closed.html
Disallow: /core/
Disallow: /lib/
Disallow: /install/
Disallow: /js/
Disallow: /schemas/
Disallow: /*?

Am i looking in the correct robot.txt file for the URL Restricted by robots.txt error message?

Just noticed this after posting the above message. In the public.html/var/.htaccess

Order deny,allow
Deny from all

order allow,deny
allow from all

The sitemap is a .xml which is listed in the var/.htaccess file, or does the var folder have nothibg to do with the sitemap.xml?

>>> does the var folder have nothibg to do with the sitemap.xml?

It has nothing to do with your regular 'sitemap.xml' file.

The robots file should be like this:

User-agent: *
Disallow: /app/
Disallow: /store_closed.html

Nowadays there isn't a need to "tweak" the robots file like years before. Google will make it's own decision what will be crawled or indexed and what not.

https://support.google.com/webmasters/answer/6062608?hl=en

Robots.txt instructions are directives only
The instructions in robots.txt files cannot enforce crawler behavior to your site; instead, these instructions act as directives to the crawlers accessing your site. While Googlebot and other respectable web crawlers obey the instructions in a robots.txt file, other crawlers might not. Therefore, if you want to keep information secure from web crawlers, it’s better to use other blocking methods, such as password-protecting private files on your server.

Also:

The /robots.txt file is a publicly available: just add /robots.txt to the end of any root domain to see that website’s directives (if that site has a robots.txt file!). This means that anyone can see what pages you do or don’t want to be crawled, so don’t use them to hide private user information.

And the last question is - are the URLs of the new CS-Cart version the same as the old one?

The robots file should be like this:

User-agent: *
Disallow: /app/
Disallow: /store_closed.html

So are you saying i should change the public.html/robot.txt file to the example you have given?

And the last question is - are the URLs of the new CS-Cart version the same as the old one?

The third party were instructed to copy the site to the latest version V4.7.2 so i assume they kept all the urls the same, to be honest if they did not keep the url's the same as the old version then surly they should of instructed me as its critical for any seo that's been built up over the past years.

Is there a way i can check?

So are you saying i should change the public.html/robot.txt file to the example you have given?

I'm just saying, this the default robots.txt file and there isn't a reason to change it.

The third party were instructed to copy the site to the latest version V4.7.2 so i assume they kept all the urls the same, to be honest if they did not keep the url's the same as the old version then surly they should of instructed me as its critical for any seo that's been built up over the past years.

Yes, that could be your problem.

Is there a way i can check?

Sure, you can check/comapre it with Google Webmaster Tools https://www.google.com/webmasters/tools/

But if you didn't use it before (with the old version, then you won't see the 404 errors as you will use your current sitemap.

Try to type into the Google serach:

site:yourdomain.xy

You will see the number of indexed pages. Then check a few links if there are any 404 sites.

Tried site:yourdomain.xyz and it shows 661 pages and no 404 when clicked various displayed links. So it looks like Google has indexed 661 pages which does not follow suit with the information in my Google search console - Google Index - Index Status Total indexed 121?

Tried site:yourdomain.xyz and it shows 661 pages and no 404 when clicked various displayed links. So it looks like Google has indexed 661 pages which does not follow suit with the information in my Google search console - Google Index - Index Status Total indexed 121?

Did you update your current sitemap and re-submited it in your Google webmaster?

this is a cs issue from its birth :)

We checked our CS-Cart test installations and it seems the CS-Cart Google sitemap addon doesn't follow the categories and thus the products won't get crawled.

In our webmaster tools we can see:

13 URLs submitted
9 URLs indexed

But Google shows: 334 results

site:cs-cart-license.com

you must add a new property/site in webmaster tools, with the https prefix

The index for the old property none https will fall, and the new property will build up, https

search

site:https://www. yoursite. xyz

and then

site:http://www. yoursite. xyz

also the "robots txt is blocking the sitemap" can sometimes be false.

delete the old sitemap and add new

Changed the robot.txt file to the following below.............. then re submitted the sitemap in Google Search Console and it came back successful this time so will wait and see what Google finds now.

User-agent: *
Disallow: /app/
Disallow: /store_closed.html

you must add a new property/site in webmaster tools, with the https prefix

The index for the old property none https will fall, and the new property will build up, https

search

site:https://www. yoursite. xyz

and then

site:http://www. yoursite. xyz

also the "robots txt is blocking the sitemap" can sometimes be false.

delete the old sitemap and add new

I was led to believe that the webmaster account did not have to be changed from old http to new https or is this incorrect?

site:https returned 597 results in Google

site:http returned 597 results in Google

Updated new sitemap in CsCart Admin then re submit to Google webmaster.

this is a cs issue from its birth

Please elaborate what you mean by this, is it a fault with CsCart in general?

you need to add the new https://

the old one can remain but will gradually fall off the radar as the new one kicks in. no more than a week it took mine for 5000 pages

https://prnt.sc/ix5dbv

https://prnt.sc/ix5dml

then add sitemap and robots.txt to the https

https://prnt.sc/ix5ekw

https://support.google.com/webmasters/answer/6073543?hl=en

and dont worry, the ranking will come back, just tell it where to look.

Johnbol1 Thanks for the detailed explanation. Ive added the https and now need to verify the site even though the http has already been verified which i would of thought verified the https as well as ive added them to my Google web master account.

I have ftp access but im not sure where to place the tracking code google says to put it into the head section, which file is this when using ftp to access the file.

Does Google generate a different code every time i add a site as http or https even though they are the same url or is there a way to keep all the same url's under one Google analytics and search console with the same tracking code?

I have ftp access but im not sure where to place the tracking code google says to put it into the head section, which file is this when using ftp to access the file.



You have to add the code into:

/design/themes/responsive/templates/index.tpl

If you use a third party templates, the file location might be changed. Also there is a hook place holder:

{hook name="index:head_scripts"}{/hook}

You can use a hook for the google code to avoid overwriting by next CS-Cart update.


Does Google generate a different code every time i add a site as http or https even though they are the same url or is there a way to keep all the same url's under one Google analytics and search console with the same tracking code?



If the code remains in the 'head' section, it will be applied to all old or new pages of your store.

/design/themes/responsive/templates/index.tpl

Added verification code...........google-site-verification: google788ae000633f0000.html .........(code edited) in the index.tpl, file, first line of code below but still unable to verify.

Am i missing something.

Added verification code...........google-site-verification: google788ae000633f0000.html .........(code edited) in the index.tpl, file, first line of code below but still unable to verify.

Am i missing something.

This is not the verification code. You have to upload the file 'google788ae000633f0000.html' to the root directory of your cart installation and then verify again.

https://support.google.com/webmasters/answer/35179?hl=en

Regarding your posts, you are talking about "GOOGLE WEBMASTER TOOLS" (search console) and "GOOGLE ANALYTICS" - these are two different things.

The analytics code must be added to the header section of the .tpl site.

This is not the verification code. You have to upload the file 'google788ae000633f0000.html' to the root directory of your cart installation and then verify again.

Apologies i did not explain correctly.

Do i have to create new file for google html code in the root directory then upload file info?