Mod_rewrite duplicate urls issue

I have mod_rewrite turned on. It is working nicely.



But I ran a sitemap generator [url]http://www.auditmypc.com/xml-sitemap.asp[/url] and it is showing dozens and dozens of urls that look like this:



Shop the best pepper spray selection available – Redhotpepperspray.com



In addition to these:



Shop the best pepper spray selection available – Redhotpepperspray.com



Anyone who knows anything about SEO knows that the search engines do not like to see dozens of the same urls from one website, with the same dynamically created titles, etc.



REMEMBER: cs-cart generates Default metatags for any pages that you do not specifically create custom metatags for. So all these pages are showing up with the default tags.



I’ve seen a few posts dancing around this subject, but I don’t know if you guys know how serious this actually is.



BTW, I tried adding a robots.txt entry or two to:

User-agent:

Disallow: /index.php?target=

Disallow: /catalog/

Disallow: /index.php?

Disallow: /
?



But I can’t see that any of these are actually working.



Somebody please give me a hint on how to fix this problem.



Thanks.



BR

If i was you i’d use this sitemap generator [url]http://forum.cs-cart.com/showthread.php?t=9171[/url]

Thanks for that. I will need to update my versions on both sites and give the new sitemap a try.



Bryan R.

I’m having duplicate content for other issues, but for pages like login, cart pages, etc, you should really nofollow those links on your site to keep PR leaking to those pages and do a disallow on your robots.txt to those pages. The disallow will prevent the google from indexing and therefore you won’t have those as duplicate title tags.

Moka,



I agree with the nofollow. I am trying to locate all the instances in the tpls. I think I got all of them except that custom footer.



The disallow I have already done. I assume it is working correctly and the sitemap generator I tried (which has an option to respect the robots.txt file) is still finding all those pages. Maybe the sitemap generator simply is not working right.



I also tried a Crawl test at seomoz.org, but it is showing those pages as well.



How can I truly check somewhere to find a test that honors the robots.txt file and make sure my robots file is working properly?



Thanks.



Bryan R.

[quote name=‘animatedmarketing’]



How can I truly check somewhere to find a test that honors the robots.txt file and make sure my robots file is working properly?

[/QUOTE]



The best one that I have found is google webmasters tool. Once google picks up your robots.txt (about every 24 hours), you can type in url in their tools section and see if it is being blocked.



Webmasters will also tell you which duplicate titles and meta descriptions, broken links, it is picking up.

K, thanks!



BR