Problems w/ HTML Catalog?

I’ve been reading through the forums, today specifically the wish list, and I just ran across something that really concerns me. Rather than resurrect an old thread that was mostly focused on suggesting future features, I’m just going to quote from it and ask for some feedback. You can find the original post here.


[quote name=‘MarkWhoo’]As far as the HTML catalog, I have a lot of mixed felelings about this.



innitially, I thought this was the best thing since sliced bread. what I found was it was a nighmare. not only is is impossible for great SEO, but it will cause you great damage in the serps as far a your store listing.



Now if you are walmart or sears, then you have no issues. Who cares about SEO. If you are little old me, I DO and depend in a high listing in serps.



So, I deleted all use of catalog and made great use of the great rewrite mod. It makes all links appear to be html. Noew the serps eat ALL of the links, you simply disallow php pages to be crawled and walah, you shoot to the top, especially if you do a study of the same content for two stores, one uses catalog and other the rewrite mod. The rewrite mod kicks the catalog in the pants every time.



So, I woud rather use GOOD SEO tactics withthe rewrite mod and just leave the catalog for those that like it.[/quote]



This concerns me a great deal. My choice to buy CS-Cart was highly influenced by it’s ability to generate a static catalog. I have a number of reasons for wanting this which I won’t go into now, but I’d really like some feedback as to why this may be the case. Does anyone have any strong conclusions as to the why behind the html catalog under-performing?



I see several ways to improve the html catalog, but don’t see most of those affecting ranking significantly, the issues I’m concerned with are also evident in the dynamic catalog. I really hope to avoid a mod_rewrite approach, both due to server load, potential duplicate pages (because navigation hrefs won’t change) as well as other concerns.



I really would like some feedback here before I commit a lot of time to development.

It looks like I’m going to need to answer my own question here … apparently not many of the active vocal forum members use or intend to use the html catalog. I’m sure there are some out there though.



I’ve taken a fairly good look at it now, and I believe I can point out several reasons that the html catalog in it’s current form will work against you in the search engines, and will almost be guaranteed to under-perform. I had hoped to see changes in the 1.3.4-RC1 version, and it is slightly improved but not by much. I think this pre-release testing period is a great opportunity to address these issues if the CS-Cart staff will consider making a few changes.


  1. To begin with, the html catalog function ONLY produces the catalog pages in html, plus an index.html page which it locates in /catalog. There is nothing inherently wrong with having a site that uses both static and dynamic pages, however I can say that for those of us who prefer a static catalog, the expectation is to have an entirely static site (except perhaps the cart and checkout). The same preferences apply to the entire site, not just the catalog pages, and currently all pages except the catalog are dynamic.


  2. The bigger issue though is that the catalog uses a different index file than the rest of the site. This is a real problem for Search Engine placement because the same content is contained in potentially 3 separate locations as far as the search engines are concerned: www.domain.com/, Domain.com and www.domain.com/catalog/index.html. For good SEO, it is critical that only one index location exists, and that should really be “www.domain.com/”. All links to home within the site should return to one location, and no others. It’s preferable to not use the index file in the address as many will naturally link to a site without it.


  3. Also, there are real problems with the interaction between the html pages and the rest of the site. Once a customer navigates to a dynamic page, they are from this point forth on the php version of the site, and will never return to the static version of the catalog. Huge problem.



    Even if there are both dynamic and static pages on the site, the customer (or spider) should never be directed to the php version of any page that has a static version available, otherwise the webmaster is at risk of having duplicate information indexed by the spiders. Duplicate content takes the control out of the webmaster’s hands, and the Search Engines will determine which page to list. Spiders find pages by following links. If someone links to a php page instead of the html page, then link popularity & PR will be credited to the php version, not the html version, in effect splitting the benefits of inbound links between the two versions of the page.



    There is no reason inherently that a php page should rank better than an html page. But this splitting of link popularity & PR will almost guarantee an html catalog site with the current setup to have a much smaller chance in the search engines than the purely php site, simply due to internal link structure. This really is a HUGE deal if the html catalog is to be a useable feature.


  4. When you do a product search, the results return the customer to the php version of a page, not the html version. It is ok that the search page itself is php as it contains no info that would normally be indexed, but as above you are giving another opportunity to accidentally have the spiders index the wrong version by not returning the html catalog if someone links after the product search, and again will split link popularity between the two.



    I see there is a mod-rewrite SEO file name option which might be used to pretend to be a static site, but that is a server-intensive operation. Those who truly prefer a static site will not find that satisfactory as reducing server loads is one reason among several others for using a static site.



    I honestly see no value at all in using the html catalog in it’s current state, but I’m certain it can be made to be quite useful to those of us who want it. The page generation works great, it’s principally a navigation issue to ensure no duplicate content issues exist.



    Extending the html generation to include all pages of the site would be by far the best solution. Short of that, the navigation served up on each page should be conditional, if the catalog exists it serves links to the html pages, if not it uses the php versions.



    These issues are a BIG deal if the catalog is to be viable. I hope the CS-Cart team will consider some revisions before the final 1.3.4 is released, and I’ll contact them directly on this.

I just uploaded my Google Base Feed and noticed the links point to static html pages in the category folder. Since I hadn’t planned on using static pages, I just got forced into doing it with my Google Base feed. This dual static and dynamic scheme isn’t going to be workable for us. I can’t be generating HTML pages every time I make a price or description chage. Now I need to write mod_rewrite rules to get them pointing to my dynamic pages. As Arlen point out, this needs to change ASAP.

I upload to google without using the html catalog, I dónt even have an html catalog, all link on froogle go directly to the dynamic page, but it took for ever to update the change,

[quote name=‘taydu’]I upload to google without using the html catalog, I dónt even have an html catalog, all link on froogle go directly to the dynamic page, but it took for ever to update the change,[/QUOTE]Interesting. I didn’t have an html catalog either. When I uploaded to google base, they all pointed to the html catalog. I did see this in tool.php “if ($catalog_exp == ‘Y’)” to generate the static pages. I will regenerate the google feed using dynamic links by making $catalog_exp=‘N’.



As a side note. I would be hard to generate a mod_rewrite rule with the static URLs the way the are now because of the varying number of underscores. From what I read, a hyphen is the best way to go because Google sees them as word breaks. I just wrote a routine in php to parse the current static URLs and redirect to the dynamic pages while I wait for my upload to be published.

[quote name=‘sculptingstudio’]This dual static and dynamic scheme isn’t going to be workable for us. I can’t be generating HTML pages every time I make a price or description chage.[/quote] We have different reasons, but the issue is the same … you can’t have 2 versions of the same page available or you’ll have problems. At least one other cart I looked at had an auto-generate function that keeps the html catalog updated when a change is made. That is my intent in the long run, but I’m looking at one step at a time.



I’ve dug deeper into this and to do what I described should be relatively straight forward I think. If I understand what it’s doing correctly, the generate catalog function simply captures the output of the php file, changes navigation links and saves it as an html file. The functions used to do this reside in the html_catalog.php file, and seem to be universal enough that they could be used to generate static pages for topics and pages as well. I’m not certain what would need to be done for cart and search pages, but their navigation should point to the appropriate version of pages.


[quote name=‘sculptingstudio’]As a side note. I would be hard to generate a mod_rewrite rule with the static URLs the way the are now because of the varying number of underscores. From what I read, a hyphen is the best way to go because Google sees them as word breaks.[/quote]This is very true. I was quoted $50 to make the change from underscores to dashes prior to buying. Once I got the un-encrypted files, I found that changing this was simply a matter of changing 4 instances of “_” to “-” in the html_catalog.php file … 30 seconds max. This kind of simple mod should be explained in a FAQ and not charged for. I haven’t posted how to do that yet as I am limited in my php knowledge and wanted to be certain I wasn’t breaking something else. My tests though show no problems with generating file names w/ a dash so far.

[quote name=‘sculptingstudio’]Interesting. I didn’t have an html catalog either. When I uploaded to google base, they all pointed to the html catalog. I did see this in tool.php “if ($catalog_exp == ‘Y’)” to generate the static pages. I will regenerate the google feed using dynamic links by making $catalog_exp=‘N’.



As a side note. I would be hard to generate a mod_rewrite rule with the static URLs the way the are now because of the varying number of underscores. From what I read, a hyphen is the best way to go because Google sees them as word breaks. I just wrote a routine in php to parse the current static URLs and redirect to the dynamic pages while I wait for my upload to be published.[/QUOTE]



Humm that is really interesting,

[quote name=‘taydu’]Humm that is really interesting,[/QUOTE]Here is one url CS-Cart produced for my Google/Froogle feed:



[noparse]High Speed Carvers, Burs, Sculpting Clays, Fa-Brick, Eggs



The first problem is it didn’t remove the tag and as you can see the number of underscores would be diffrent to write a rule to parse the number 20 for the product id. So what I did was to delete the folder ‘catalog’ and take the 404 error to a parser to pull out the product id. Another way to do this would be to use a ForceType in .htaccess to change ‘catalog’ into the parser. When apache sees the url, it could run catalog.php as the parser routine to extract the product id. If you want to see it work, here is the live link:



[url]High Speed Carvers, Burs, Sculpting Clays, Fa-Brick, Eggs

in config.php you can change the location of your html catalog to be in the root dir…works great however I do not know if they fixed the bug that happens when you delete catalog because if you have your html catalog in the roor dir and click on delete catalog it deletes everything in the root dir when it should only delete cs related html pages…your points are valid though it’s not going to make or break your store, what will make or break your store is a lack of advertising and a poorly designed site…

CS Cart uses a function called fn_convert_links to change the links of a page to more static urls which are SEO friendly… however, if you ever use currency or language changing (which uses a selection drop down box) inside your static pages, within a catalog view, choosing a different currency or different language actually tries to go to the same page with an additional GET variable (¤cy=usd), however the static pages detect a link to a catalog page and converts the entire url to a static page, not realizing we need to hit a dynamic page to change the currency and/or language, this can be a big problem if a user is browsing your site through the static pages, and chooses his native country (canada for example) and the link goes to something like “static_page.htmlcan” (notice it is trying to append the currency type to a variable association in the url) which should be “index.php?category=12¤cy=can”. So I played around with the regex (Thanks RegexBuddy) string in fn_convert_links and simply added ONE character. It will basically fail for the check if there is an additional parameter after the url following the category selection (aka “&”)… So the new regex should look like this for line 333 of file html_catalog.php :



```php $page_source = preg_replace_callback(“/” . $_host_dir . “\/” . $customer_index . “?” . $target_name . “=categories(&|&)(category_id=[^"'>&]+)(["'])/iUS”, ‘fn_category_callback’, $page_source); ```



Notice the ‘&’ in ‘(category_id=[^"’>&]+)'



And thats all you need!



Any questions, send me a message : floydroute@gmail.com

Adding my name to signify an interest.

Can someone explain why would anyone ever need a HTML catalog? I don’t get it.



arlen has shown more than a few good reasons not to use it. Still, why is this feature even available in cs-cart?



:confused: :confused:

The HTML catalogue in effect should load faster, as all the pages are pre-generated so theres no need to run through a bunch of queries to get the products showing on the page.



Saying that, Ive not used it, and dont intend to so thats about all I can summarise.