Google No Longer Indexing after upgrade to v4

Issue

Google previously indexed approximately 1000 links for our site. We have recently upgraded to v4, which we did at the same time as moving host - after doing so, we have noticed that Google slowly de-indexed pages. (Please note - when I refer to “Indexed” I am referring to the number shown in WMT under the “Sitemaps” tab where it states the number of pages submitted vs the number Indexed).



Google is showing a large number of soft 404 errors for pages that exist - it appears presently to be more specific to dynamic pages, e.g product pages.

We confirmed the URL's that errored do exist and Google reports a 200 OK/Success when we use the Fetch as Google tool.

We have also accessed the pages using Googlebot as the User Agent and can access all links as expected.



Google is stating that approximately 1300 pages are blocked by robots.txt, but this file has not been changed since we were on v3. We're currently working on the assumption that these errors are not the cause of the issue and have temporarily amended the robots.txt to allow Googlebot access to all files/folders to confirm/deny this theory.



Google also states (under Google Index → Index Status) that 1,362 URL's are indexed, but this is rapidly decreasing day by day as Google fails to successfully crawl the site.



Diagnostics/attempts to resolve so far

Searching for the site on Google displays several pages of links, but only a few resolve as quite a few are old links that Google has not de-indexed despite them 404'ing for a considerable time. Please see here for the true state of currently indexed/displayed links.



Permissions have been checked and amended as per several topics on the forums as to the correct setup for folder/file permissions.



We have listed the site as both www. and non-www in WMT.



DNS is working as intended and resolves URL's for both www. and non-www.



The SEO addon now resolves pages that have .html appended to them. At base install, it did not correctly resolve pages ending with .html



Apache logs have been checked and Googlebot appears to be getting the correct responses for all pages. A combination of 404's (confirmed pages do not exist), 200 and 301/302 responses are seen and appear to be valid.



PHP error log has been checked and there do appear to be some errors as per the attached log. Unfortunately my PHP knowledge is very limited and searching for similar errors has not proved successful.[attachment=7057:error_log1.xml]



I have also attached a copy of my .htaccess file in the event that this is causing some of the issues seen. [attachment=7058:htaccess.txt]



Question

We have recently implemented a secondary storefront and all products are shared between them in multiple categories in each store. Could this have an adverse affect on anything? e.g. file locations differing or additional files that need amending?

As per the CS-Cart instructions, this is on a different domain pointing to the same root directory. Google as yet (it has been 3 weeks now) has not indexed any pages which appears odd. Usually across most sites we have implemented we start to see at least some pages indexed after (at the very latest) a week or so.



Hopefully it's something glaringly obvious to the right eyes, but I've hit somewhat of a brick wall in terms of what to check/where to go next with this. As you can understand, going from having many highly placed links to very few is incredibly frustrating and is causing considerable monetary loss.



Any help/advice would be much appreciated.



Thanks.

error_log1.xml

htaccess.txt

Do you have seo url changes after the upgrade?

Hi cscartrocks,

The SEO url has been kept as per the previous setup (category name/[subcategory_name]/product_name.html). The previous setup used to work with both .html and non .html links but this wasn't working when we implemented v4. A couple of amendments to func.php resolved this (as per the second to last post on this thread SEO: remove .html - General Questions - CS-Cart Community Forums).

Another (possibly) interesting point to note is that when you search for the site in Google and attempt to load a cached version, it now 404's, having previously had a cached version to load. Also, when looking at the site snapshots thumbnails on the home page of WMT, it shows a page stating “Oops! An error occurred.” - this can also be seen in some of the very few indexed links appearing on Google as where the meta description used to be it is now replaced with the “Oops!” error message text.



May this be indicative of a script error or otherwise?

Usually the “oops” error(s) are the result of a PHP error/notice that the Javascript in the browser can't process (result of an AJAX request). Generally you can debug this with firebug by looking at the NET/XHR request/responses. But most AJAX requests should not be links that should be followed by Google, however some of the paging links should (I.e. larger product lists).



The confusing part of your description is that you state that Google is indicating it is receiving 404's, but when you cut/paste the link in your browser, it resolves fine. I'm guessing you're going to need the helpdesk to resolve this for you. Please post what you learn from it. Do you notice that the false 404's are coming from any specific type of page (product, category, page, etc.)?

Your document head is missing various declarations. If I were you I would restore the index.tpl and meta.tpl to the original versions and see how you go from there.



I am seeing 2070 URL's indexed now.



I would like to say you're going crazy - but there have been numerous times I have clicked the 'Preview' link on a Category or Product from within the admin and I have received a 404, but reloading loads the page just fine. It does worry me the same happens for Google/customers.

We've now managed to solve the problem and Google now has around 3 times the amount of results when doing a “site:” search and it's also indexing new pages created since it stopped working. Soft 404's have also stopped occuring.



It appears it was an issue with the standard robots.txt created on install of v4. It looks like v4 caches a lot more pages and information than v3, and as such allowing access to the /var folder looks to have solved the issue. Perhaps when Google has attempted to crawl the urls, CS Cart has tried to point it to cached versions of the page/files held under /var and because robots doesn't allow access to this it's failed to load correctly creating a soft 404 (i.e. no 200 or hard 404 response from server).



I may be wrong with my theory above, but it would be something to watch out for if you do upgrade to v4 and a similar issue occurs.



Thanks for all the help/suggestions.

It would be helpful if someone from CS-Cart could confirm/unconfirm if the var-folder shold be allowed/disallowed in the robots-file.

I just want to confirm something. Is your server on NGINX ?

This question is for all with this problem actually

I'm having similar issues… I'm getting an error from Google saying that they are getting a soft 404 for my homepage. I think this is blocking indexing.

I've done most of the changes mentioned above…

  • I removed /var from robots.txt
  • I added both http://www.voltnow.com and http://voltnow.com and set http://voltnow.com as primary in GMT
  • submitted sitemap which seems okay and showing 151 pages.



    I haven't touched htaccess.



    When I go to Google and try: site:voltnow.com

    I get nothing. I've waited a few months and still nothing.



    Any ideas? @seonid Not sure what NGINX is. Our server is Apache based.



    This is a new site so I'm not dealing with cached pages or redirects. We just aren't able to get indexed at all.











    I've been waiting weeks for indexing to occur.

Not sure if there is any relation but I tried looking at Bing WMT. When I fetched my page, it came back as the proper 200 but gave an “Oops” message. I wondered if this may have caused problems with Google.

I found this post which fixed a conflict with the statistics add-on.

[url=“Ooops! An error occurred for V4 - SEO - CS-Cart Community Forums”]Ooops! An error occurred for V4 - SEO - CS-Cart Community Forums



The page now fetches correctly. We'll see if the soft 404 goes away as Google tries to re-index.

Either way, this is worth fixing.

I am also getting google search error… I have installed first time cs cart multivendor 4.01…

Pls have a look…

http://i1102.photobu…searcherror.jpg



This is serious error because of this our website is not getting google or bing organic traffics.



If some body can help me out regarding this matter it will be great full.

[quote name='mediainfo' timestamp='1383919321' post='171236']

I am also getting google search error… I have installed first time cs cart multivendor 4.01…

Pls have a look…

http://i1102.photobu…searcherror.jpg



This is serious error because of this our website is not getting google or bing organic traffics.



If some body can help me out regarding this matter it will be great full.

[/quote]



Hi mediainfo



Many code http 302



test a1 sitemap http://www.microsyst…emap-generator/

[quote name='mediainfo' timestamp='1383919321' post='171236']

I am also getting google search error… I have installed first time cs cart multivendor 4.01…

Pls have a look…

http://i1102.photobu…searcherror.jpg



This is serious error because of this our website is not getting google or bing organic traffics.



If some body can help me out regarding this matter it will be great full.

[/quote]



Hi mediainfo



Many code http 302



test a1 sitemap http://www.microsyst…emap-generator/

sod.jpg

Thank you so much lulu51 , I fixed this issue by upgrading cs cart 4.03. Now it is indexed but it is not showing our categories.

Pls have a look …



http://i1102.photobucket.com/albums/g443/mediafkk/indexGoogle.jpg



Thanking You,

mediainfo, your categories are being indexed.



Google “site:www.domain.com” to find the indexing status. Also, if you haven't already, sign up for Google Webmaster Tools and use the “Crawl as Google” testing tool to check for problems. Note Webmaster Tools and the “site:” search will show varying results for number of pages indexed, this is normal. Also, check the list of errors for “Soft 404's” as these are the type which Google has encountered due to the bug(s) in 4.0.1/2.

I have updated one of my shopping carts to 4.0.3 and when attempting site:www.mydomain.com it states “A description for this result is not available because of this site's robots.txt.”. Prior to this we were being indexed. This site's Google Analytics ecommerce and tracking are reporting statistical information on visits, etc. but for SEO this really needs a fix asap. Pleaese advise…



Thanks.

[quote name='HOPELights' timestamp='1384861144' post='171929']

I have updated one of my shopping carts to 4.0.3 and when attempting site:www.mydomain.com it states “A description for this result is not available because of this site's robots.txt.”. Prior to this we were being indexed. This site's Google Analytics ecommerce and tracking are reporting statistical information on visits, etc. but for SEO this really needs a fix asap. Pleaese advise…



Thanks.

[/quote]



difficult to answer if you do not know the online site



Lucien

My apologies Lucien. The site that has been upgraded is: http://wendyjcook.com/

HOPELights, the issue you are describing has been covered here with a few possible solutions and another possible solution has also been posted on the website of SEONID

[quote name='HOPELights' timestamp='1384886964' post='171943']

My apologies Lucien. The site that has been upgraded is: [url=“http://wendyjcook.com/”]http://wendyjcook.com/[/url]

[/quote]



Hi HopeLighs



test robots.txt [url=“http://www.frobee.com/robots-txt-check”]http://www.frobee.com/robots-txt-check[/url]









Line 9:

Missing User-agent directive! Found Crawl-delay

Each rule record has to start with at least one User-agent statement. Blank lines delimit rule records and may not be used between User-agent and Crawl-delay statements.







Google does not support the Crawl-delay parameter in the robots.txt file



To limit the maximum scanning speed of Google, simply connect to Webmaster Tools Google

Site Configuration / Parameters:



Lucien