Robots. txt and

Hi,

I can read robots.txt publicly (http://domain/robots.txt). Permissions is 644. If anyone can read robots.text, then what is the point of putting in Disallow: /admin.php? If you change admin.php to something else and update the Disallow:/admin-newname.php, then this name can be read in robots.text.

Reference Bug report [URL=“http://forum.cs-cart.com/vbugs.php?do=view&vbug_id=1767”]http://forum.cs-cart.com/vbugs.php?do=view&vbug_id=1767[/URL]

Should I have a different permission for robots.txt?



Also, I assume that if I remove Disallow: /images/, then the images would be indexed by search engines.

Am I misunderstanding something?

Thanks,

Bob

if you have /images/ then product images and anything under the directory will NOT be indexed.



adding /admin.php is a stupid idea as somebody could “find” the administration login page just by referencing robots.txt

[QUOTE]Hi,

I can read robots.txt publicly (http://domain/robots.txt). Permissions is 644. If anyone can read robots.text, then what is the point of putting in Disallow: /admin.php? If you change admin.php to something else and update the Disallow:/admin-newname.php, then this name can be read in robots.text.[/QUOTE]



Hello Bob,

Your correct in that a permission of 644 is readable by anyone (has to be for search engine robots to read & hopefully follow the instructions given). So, yes, if you change the name of your admin.php & update robots.txt to reflect the new name, it can be read & discovered by whomever searches through your robots.txt file. So, is it best to not list & disallow this in robots.txt?? Don’t know…


[QUOTE]Also, I assume that if I remove Disallow: /images/, then the images would be indexed by search engines.

Am I misunderstanding something?[/QUOTE]



No, your right on, all images within the /images/ folder would then be indexed.

[QUOTE]adding /admin.php is a stupid idea as somebody could “find” the administration login page just by referencing robots.txt[/QUOTE]



Jesse,



I am with you here, really defeats the entire purpose of changing the name of your admin.php in the first place…



However, if you don’t disallow your secretly named admin.php, then what are the odds that it will get picked up & displayed by google, etc. ? :confused:

This is why you secure the admin.php with either apache or the .htaccess…it absolutely should be in the robots.txt to not get spidered + extra security. Should be a standard security practice to secure this page.

[quote name=‘Struck’]Jesse,



I am with you here, really defeats the entire purpose of changing the name of your admin.php in the first place…



However, if you don’t disallow your secretly named admin.php, then what are the odds that it will get picked up & displayed by google, etc. ? :confused:[/quote]


[quote name=‘Ion_Cannon’]This is why you secure the admin.php with either apache or the .htaccess…it absolutely should be in the robots.txt to not get spidered + extra security. Should be a standard security practice to secure this page.[/quote]



The fact that someone finds the admin login it’s not too much of an issue until the actual URL has been posted for the internet to see.



From 2.0.12 you are supposed to change the admin login url to something different anyway. In terms of google, there is a very small occurrence that it will happen.



In my opinion using robots.txt to disallow is a bit silly and makes the administrator look incompetent since you can do the same via Google / Bing / Yahoo webmaster tools without publishing the url for the internet to see.



htaccess is a valid idea but I’ve encountered a few issues with stability at times when apache locks you out :wink:

[quote name=‘JesseLeeStringer’]The fact that someone finds the admin login it’s not too much of an issue until the actual URL has been posted for the internet to see.



From 2.0.12 you are supposed to change the admin login url to something different anyway. In terms of google, there is a very small occurrence that it will happen.



In my opinion using robots.txt to disallow is a bit silly and makes the administrator look incompetent since you can do the same via Google / Bing / Yahoo webmaster tools without publishing the url for the internet to see.



htaccess is a valid idea but I’ve encountered a few issues with stability at times when apache locks you out ;)[/QUOTE]



True but I’d rather take my chances and not have the admin panel indexed. If someone is directly targeting your site by looking at your robots.txt file then you are probably screwed anyways. Having your admin panel indexed opens up doors for people that do not know about your site at all. So I’d say the opposite of “making the administrator look incompetent” is true. I still say it’s best practice to not have your admin panel indexed…I’ll take my chances on someone finding my site, then finding my robots.txt file to see what my admin panel page is renamed to…but thats just me.

Point is that admin.php should be somethingelse.php so why add it to robot text!!!

Hi,

Is there such a thing as Disallow: /*.php (use of a wildcard/)

Thanks,

Bob

[quote name=‘BarryH’]Point is that admin.php should be somethingelse.php so why add it to robot text!!![/QUOTE]



What ever your admin page is renamed to, if it is not in robots.txt, google or other bots will index it thus making your admin panel page public-- at least the meta data.

Alright, so I have made my critical split second decision regarding this! :cool:


  • I will disallow our renamed admin login page within robots.txt

    (PS: our secret name = ajava.php Have fun breaking our password! :wink: )



    My reasoning is that I would rather not take the chance of it being indexed and displayed to every simple minded person on the planet vs. knowing that any entry level hacker would be smart enough to search through our robots.txt file and use it as a roadmap.


  • Use an extremely strong username/password combination, ours is not something that you will want to be typing in every couple of hours for sure!


  • “Consider” further protection of the admin login page via .htaccess



    I have been on Windows based servers for .asp and .asp.net carts for the last 9-10 years so I am still polishing up on unix.

[quote name=‘Struck’]Alright, so I have made my critical split second decision regarding this! :cool:


  • I will disallow our renamed admin login page within robots.txt[/quote]



    dude, what’s the point of making it “secret” then publishing the url?



    Like batman with his batcave providing directions via Google Maps…

[quote name=‘JesseLeeStringer’]dude, what’s the point of making it “secret” then publishing the url?



Like batman with his batcave providing directions via Google Maps…[/QUOTE]



My reasoning, as I attempted to explain, is that it will be noticed by far fewer people if it is disallowed in robots.txt



Let’s face it, re-naming your admin login has a pretty minimal effect on your store security regardless, right?



And PS: I am not a dude, I am a very girly girl! (You should see what I am wearing right now!) :smiley:

Dudette, (How awesome :P)



Even fewer people will find your admin login url if it’s never published. The chances that someone will try every possible combination to find your admin login is extremely unlikely.



ie www.southeastauto.com.au/.php

could be or



So having it published in robots.txt makes it VERY easy to find for someone trying to be malicious rather than an unfortunate google searcher



PS: southeastauto.com.au still has admin.php - Havn’t changed is as I don’t maintain it anymore.

[quote name=‘JesseLeeStringer’]Dudette, (How awesome :P)



Even fewer people will find your admin login url if it’s never published. The chances that someone will try every possible combination to find your admin login is extremely unlikely.



ie www.southeastauto.com.au/.php

could be or



So having it published in robots.txt makes it VERY easy to find for someone trying to be malicious rather than an unfortunate google searcher



PS: southeastauto.com.au still has admin.php - Havn’t changed is as I don’t maintain it anymore.[/QUOTE]



Your logic is flawed… if your admin page is indexed, it is far easier to find bar none…there is a reason why every shopping cart company suggests you put the admin url the robots txt file. It’s not because they are incompetent brother…

[quote name=‘JesseLeeStringer’]

ie www.southeastauto.com.au/.php

could be or



So having it published in robots.txt makes it VERY easy to find for someone trying to be malicious rather than an unfortunate google searcher



PS: southeastauto.com.au still has admin.php - Havn’t changed is as I don’t maintain it anymore.[/QUOTE]



If your admin page is not disallowed in robots.txt you can find it fairly easy by doing a “site: www.southeastauto.com.au” search in google, if it is indexed, it will be there. Someone wanting to find it, will…so really it doesn’t matter either way. It’s more important to secure the admin page with other methods like .htaccess or in the apache.conf file if IP’s/subnets are known that will be accessing it. That’s really the best way. The robots.txt is kind of moot.

[QUOTE]Your logic is flawed… if your admin page is indexed, it is far easier to find bar none…there is a reason why every shopping cart company suggests you put the admin url the robots txt file. It’s not because they are incompetent brother…[/QUOTE]



Ion, you don’t have to be quite so “abrasive” do you? :wink:



This is similar to many other topics in that their are a multitude of different viewpoints, I am just gathering all of this information and using bits & pieces of all of it while attempting to determine what I believe is the best overall procedure to follow.



Be Cool! :smiley:

[quote name=‘Struck’]Ion, you don’t have to be quite so “abrasive” do you? :wink:

[/QUOTE]



What do you mean, abrasive? LOL

[quote name=‘Ion_Cannon’]Your logic is flawed… if your admin page is indexed, it is far easier to find bar none…there is a reason why every shopping cart company suggests you put the admin url the robots txt file. It’s not because they are incompetent brother…[/quote]



Tell me, how WOULD their admin page be flawed if it’s not published, then get back to me.

If your admin page has no links to it then it is almost 100% certain that it will not be indexed. However, lets put that debate aside and continue with the topic. Do not put your admin.php (whatever the name is) in your robot.txt file. There is a better way for that page I believe:



Add a meta tag to that page only, the meta tag is:





This will prevent robots from indexing that page only without the need for it in the robots.txt. Information on this can be read at [URL=“Block Search Indexing with 'noindex' | Google Search Central  |  Documentation  |  Google Developers”]http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=93710[/URL]