While this should work, it has a few potential drawbacks:
Every crawler has to do two HTTP requests: one to discover the redirect, and another one to actually fetch the file.
Some crawlers might not handle the 301 response for robots.txt
correctly; there's nothing in the original robots.txt
specification that says anything about redirects, so presumably they should be treated the same way as for ordinary web pages (i.e. followed), but there's no guarantee that all the countless robots that might want to crawl your site will get that right.
(The 1997 Internet Draft does explicitly say that "[o]n server response indicating Redirection (HTTP Status Code 3XX) a robot should follow the redirects until a resource can be found", but since that was never turned into an official standard, there's no real requirement for any crawlers to actually follow it.)
Generally, it would be better to simply configure your web server to return different content for robots.txt
depending on the domain it's requested for. For example, using Apache mod_rewrite, you could internally rewrite robots.txt
to a domain-specific file like this:
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} ^(www\.)?domain(\.com?)?\.([a-z][a-z])$
RewriteCond robots_%3.txt -f
RewriteRule ^robots\.txt$ robots_%3.txt [NS]
This code, placed in an .htaccess
file in the shared document root of the sites, should rewrite any requests for e.g. www.domain.com.ar/robots.txt
to the file robots_ar.txt
, provided that it exists (that's what the second RewriteCond checks). If the file does not exist, or if the host name doesn't match the regexp, the standard robots.txt
file is served by default.
(The host name regexp should be flexible enough to also match URLs without the www.
prefix, and to also accept the 2LD co.
instead of com.
(as in domain.co.uk
) or even just a plain ccTLD after domain
; if necessary, you can tweak it to accept even more cases. Note that I have not tested this code, so it could have bugs / typos.)
Another possibility would be to internally rewrite requests for robots.txt
to (e.g.) a PHP script, which can then generate the content of the file dynamically based on the host name and anything else you want. With mod_rewrite, this could be accomplished simply with:
RewriteEngine On
RewriteBase /
RewriteRule ^robots\.txt$ robots.php [NS]
(Writing the actual robots.php
script is left as an exercise.)