Bug fix - Google Sitemap addon in 2.0.9, 2.0.10

UPDATED TO REFLECT NEW INFO



There is a bug in the func.php file that will hamper your sitemaps. The paging structure is off by one page.



Find this code in /addons/google_sitemap/func.php:



if ($sitemap_settings['include_products'] == "Y") {
$page = 1;
$total = ITEMS_PER_PAGE;

$params = $_REQUEST;
$params['type'] = 'extended';
$params['subcats'] = 'N';
$params['page'] = $page;

while (ITEMS_PER_PAGE * [COLOR="Red"]($params['page']-1)[/COLOR] <= $total) {
list($products, $params, $total) = fn_get_products($params, ITEMS_PER_PAGE);
$params['page']++;




The code in red above highlights the modification. If you don’t do this, you will lose at most the last 50 product links.



The original fix I suggested was to change the ITEMS_PER_PAGE to the number of products you have listed. This value should be returned to 50.

I don’t have any problems but it is strange that that is in there. Wonder if removing the line all together would work.?

[quote name=‘jagorny’]There is a bug in the func.php file that will hamper your sitemaps. I have discovered it (albeit too late perhaps) and have fixed it but it is important for folks who use this functionality to make this change if you sell more than 50 products.



Find this code in /addons/google_sitemap/func.php:



function fn_google_sitemap_get_content($filename)
{
define('ITEMS_PER_PAGE', 50);

$file = fopen($filename, "wb");





This info was pulled from ajax paging routines, but the sitemap has no paging. Therefore if left as-is - only the first 50 product on the first query ‘page’ will be included in the sitemap. I personally modified this to 2000 to make room for expansion.



This will also resolve issues of news and other pages not showing up for sites that make extensive use of the CMS features.[/QUOTE]



Thank you!



we couldnt work out why we had less than 30 products on map but all pages etc… now sitemap working with all products and pages

[quote name=‘Tool Outfitters’]I don’t have any problems but it is strange that that is in there. Wonder if removing the line all together would work.?[/QUOTE]



It defines a constant that is used - i.e. it is not trivial. From what I could gather, the code for fetching products looks very close to the controllers that are used on the admin side to list products - so my guess is that they simply cut paste modded the code.



If you delete the line, you also need to delete the reference in the db query call. Or change it to a constant number - OR see if, when you have no constant, if that eliminates the LIMIT portion of the SQL clause altogether, of if it would create an error. I haven’t looked closely into the parameter triggers for db query functions, so I figured the safest mod to do until I know more information was to raise the ceiling to a much higher level so that all of my live products would get included.

[quote name=‘jagorny’]There is a bug in the func.php file that will hamper your sitemaps. I have discovered it (albeit too late perhaps) and have fixed it but it is important for folks who use this functionality to make this change if you sell more than 50 products.



Find this code in /addons/google_sitemap/func.php:



function fn_google_sitemap_get_content($filename)
{
define('ITEMS_PER_PAGE', 50);

$file = fopen($filename, "wb");





This info was pulled from ajax paging routines, but the sitemap has no paging. Therefore if left as-is - only the first 50 product on the first query ‘page’ will be included in the sitemap. I personally modified this to 2000 to make room for expansion.



This will also resolve issues of news and other pages not showing up for sites that make extensive use of the CMS features.[/QUOTE]



I can explain, for what this constant was defined for.

Please, look at this part of code:


if ($sitemap_settings['include_products'] == "Y") {
$page = 1;
$total = ITEMS_PER_PAGE;

$params = $_REQUEST;
$params['type'] = 'extended';
$params['subcats'] = 'N';
$params['page'] = $page;

while (ITEMS_PER_PAGE * $params['page'] <= $total) {
list($products, $params, $total) = fn_get_products($params, ITEMS_PER_PAGE);
$params['page']++;




As you can see, this constant was not defined for Ajax pagination. It is needed for product limitation. If you have, e.g. 10 000+ products, you will get an array with 10 000+ elements. Neither server will allow you to use so much memory and script will be terminated. So, in order to avoid this situation, the products will be divided into small packages (50 by default). This package does not need as much memory as 10 000 and step by step you will receive your sitemap and script will not be terminated by server.



Also, there is no problem with generating 100, 200, etc, products, because of this condition


while (ITEMS_PER_PAGE * $params['page'] <= $total) {




if you have 200 products in your store, the condition will look like

50 * 1 <= 200 - true

$params[‘page’]++;

50 * 2 <= 200 - true

and so on.

Um ok Alexions i see



what needs to be fixed is $total = ITEMS_PER_PAGE…



that makes your while clause fail after the first run because



50 * 1 <= 50 - true

50 * 2 <= 50 - false



that $total should be the total number of products. What’s the quickest way to get that value?

So even if I have 5,000 + products changing this numeber from 50 to 2000 should work? Google is only showing 4,253 urls in the WebMaster tools, this is lower than my total product numbers.



Is their a way from me to automatically clear the cache daily on the sitemap so I don’t have to remember to do this each time I add or delete a product?



Thanks!

[quote name=‘Offline’]So even if I have 5,000 + products changing this numeber from 50 to 2000 should work? Google is only showing 4,253 urls in the WebMaster tools, this is lower than my total product numbers.



Is their a way from me to automatically clear the cache daily on the sitemap so I don’t have to remember to do this each time I add or delete a product?



Thanks![/quote]



I think it can be done with a cron job, ask a php programmer.

If I set up a cron do I need to authenticate the cron job first?



I tried something like this and it did not work…



wget --http-user xxxxx --http-password=‘xxxxxxxx’ [url]http://www.boatersplus.com/admin.php?dispatch=addons.manage&cc[/url]

[quote name=‘jagorny’]Um ok Alexions i see



what needs to be fixed is $total = ITEMS_PER_PAGE…



that makes your while clause fail after the first run because



50 * 1 <= 50 - true

50 * 2 <= 50 - false



that $total should be the total number of products. What’s the quickest way to get that value?[/QUOTE]



Well, you are almost right, but the script works in a different way:

Look at this part of code:


list($products, $params, $total) = fn_get_products($params, ITEMS_PER_PAGE);


When the script calls the fn_get_products function we receive the 3 variables:

  1. products - products we have found
  2. params - search params
  3. total - total products



    Let’s imagine, you have 30 products.


$total = ITEMS_PER_PAGE (total = 50)




while (ITEMS_PER_PAGE * $params['page'] <= $total)


50 * 1 <= 50 (true)

then


list($products, $params, $total) = fn_get_products($params, ITEMS_PER_PAGE);
$params['page']++;




And now $total is equal to 30 and page is equal to 2

50 * 2 <= 30 (false)

That’s all. All of your products are in the sitemap.



But now you have 130 products:


$total = ITEMS_PER_PAGE (total = 50)




while (ITEMS_PER_PAGE * $params['page'] <= $total)


50 * 1 <= 50 (true)



list($products, $params, $total) = fn_get_products($params, ITEMS_PER_PAGE);
$params['page']++;




And now $total is equal to 130 and page is equal to 2

50 * 2 <= 130 (true)


$params['page']++;


50 * 3 <= 130 (false)



Also as before, all your products are in the sitemap.



Hope, this explanation will help you. Ask me, if you have other questions.

Hi Alexions,



Actually… at the very least one change needs to be made or the listing runs a page short:



if ($sitemap_settings['include_products'] == "Y") {
$page = 1;
$total = ITEMS_PER_PAGE;

$params = $_REQUEST;
$params['type'] = 'extended';
$params['subcats'] = 'N';
$params['page'] = $page;

while (ITEMS_PER_PAGE * [COLOR="Red"]($params['page']-1)[/COLOR] <= $total) {
list($products, $params, $total) = fn_get_products($params, ITEMS_PER_PAGE);
$params['page']++;




Otherwise, you lose the last page of links.



Running a debug on this, the rest does work as you mentioned.



I’ll adjust my first post to reflect this change.

[quote name=‘Offline’]If I set up a cron do I need to authenticate the cron job first?



I tried something like this and it did not work…



wget --http-user xxxxx --http-password=‘xxxxxxxx’ [url]http://www.boatersplus.com/admin.php?dispatch=addons.manage&cc[/url][/QUOTE]



I wouldn’t do that - I’d set up a cron to rm the sitemap.xml file from the cache directory once a day, and then wget the index.php?dispatch=xmlsitemap.view page right after that once a day.



The function looks for the file in the var/cache directory and runs the generator if it can’t be found.



The cc command would clear the whole cache (if I’m not mistaken) - you just want to refresh the sitemap.

I think the whole map is a bug, because i cant see on the sitemap the different links for the languages.

Well with the new fix all of my products are showing up now. And thanks for the tip on deleting the file in the cache dir.

After applying the fix, the products increased from the sitemap, however, it is still incompleted. :confused:

grabbags - I made the fix on Dec 2… it is now Dec 7th and finally I have sitelinks restored and my site crawl errors statistics are half of what they were.



There is no way, unfortunately, to force google to recrawl. It follows the sitemaps every few days, and seems to correct errors as it crawls them through other sites.



When I asked on the webmaster forum, there are a few suggestions that got passed on to me - I submit them here because they are sound advice when trying to get your SEO records straight.


  1. Include a meta tag for the canonical link on product detail pages so that google will consolidate parameter variants into single canonical links. Google, MSN and Yahoo all respect this meta parameter now.


  2. add currency and sl to the parameters that googlebot ignores

Thank you for your suggestion, jagorny.

Has anyone reported this as a bug to cs-cart?

[quote name=‘jagorny’]I wouldn’t do that - I’d set up a cron to rm the sitemap.xml file from the cache directory once a day, and then wget the index.php?dispatch=xmlsitemap.view page right after that once a day.



The function looks for the file in the var/cache directory and runs the generator if it can’t be found.



The cc command would clear the whole cache (if I’m not mistaken) - you just want to refresh the sitemap.[/QUOTE]



Can you please share the exact code for how to remove & get. We are using plesk/cron.



TIA