Duplicate Content w/ Cart

Running 3.0.X



So… noticed today Google Webmaster Tools has flagged me for duplicate content, duplicate titles, keywords. Most of this is because of computer generated links.



For instance, I created a location in blocks for showcasing 6 pages of testimonials.



The location itself asked me for a title, keywords, etc.



BUT now, when I go to:



index.php?dispatch=discussion.view&thread_id=368#ty%3Bpagination_contents_comments_0%3B%2Findex.php%3Fdispatch%3Ddiscussion.view%26thread_id%3D368%26selected_section%3Ddiscussion%26page%3D2



… the same meta tags are generated here:



index.php?dispatch=discussion.view&thread_id=368#ty%3Bpagination_contents_comments_0%3B%2Findex.php%3Fdispatch%3Ddiscussion.view%26thread_id%3D368%26selected_section%3Ddiscussion%26page%3D4



The first page is page 2 and the first is page 4 of testimonials.



The difference in the url is: 3D2 vs. 3D4, yet it is a different spidered url which shows the same title bar. With GWT, this is a concern. Also noticed it has flagged these two:



/category/sub-category/

/category/sub-category/?sort_by=position&sort_order=asc



Basically, the second one is the same page but has had “?sort_by=position&sort_order=asc” added to it and now Google sees it as a second page. Yet, it shows same title, keyword and description and I am getting flagged for duplicate content.



Please help!

There are several addons for canonical urls which is reported to fix this issue.

Thanks @requincreative. I installed a canonical addon but it does not do that feature.



When I visit: http://mysite.com/category/?sort_by=position&sort_order=asc



it is an active page, all it's own.



I could do a redirect from /category/?sort_by=position&sort_order=asc to /category/ but it seems like this issue could go really deep since there a ton of possible link extensions from sorting by price and alphabetical order multiplied by x-number of products, etc.

robots.txt


Disallow: /*?

[quote name='The Tool' timestamp='1387519544' post='173804']

robots.txt


Disallow: /*?

[/quote]



Thank you. Currently this is how I have my robots.txt file:



User-agent: *

User-agent: NinjaBot
User-agent: Googlebot
Allow: /
Disallow: /images/thumbnails/
Disallow: /skins/
Disallow: /payments/
Disallow: /store_closed.html
Disallow: /core/
Disallow: /lib/
Disallow: /install/
Disallow: /js/
Disallow: /schemas/
Disallow: /*?
Sitemap: http://mysite.com/sitemap.xml




I'm assuming Google will skip over these types of pages? I assume next time Google crawls these pages, it will not access them and therefore not show them as necessary HTML improvements in regard to duplicate content because of matching urls.

Just thought of an issue.



News content is here. http://mysite.com/index.php?dispatch=news.list



Won't /*? block Google from viewing that? Not to mention…



http://mysite.com/index.php?dispatch=discussion.view&thread_id=368

Create an SEO name for your news page…

[quote name='CarStickersDecals' timestamp='1388076589' post='174084']

Create an SEO name for your news page…

[/quote]



I did. Actually, I just checked GWT today and every single one of my duplicate titles is the result of page extensions such as [color=#282828][font=arial, verdana, tahoma, sans-serif]?sort_by=position&sort_order=asc. The problem no longer happens on newsletter pages because I created a unique location. Once I can get the canonical urls to work, it should take care of the remaining errors. I have added[/font][/color]Disallow: /*? to the robots.txt file but that does not seem to be working (as of yet).



Very odd that canonical urls are simply not doing the trick. The developer claims it is due to Cloud Flare which blocks the service somehow. CouldFlare.com claims that it is impossible. I'm sort of stuck between a rock and a hard place.[color=#282828][font=arial, verdana, tahoma, sans-serif][size=3]





[/size][/font][/color]