Quatnum Support - keyword cloud

Robots.txt optimisation

 

What is a robots.txt?

Well we aren't talking about an automaton, a "robots.txt" file tells search engines whether they can access and therefore crawl parts of your site. Therefore getting your robots.txt correct is crucial your on-site optimisation (optimization).


This file, which must be named "robots.txt", is placed in the root directory of your site.

Example of an optimised robots.txt


User-agent: *
Disallow:/admin/
Disallow:/cgi/
Disallow:/cgi-bin
Disallow:/search/
Disallow:/cache/
Disallow:/lib/
Disallow:/scripts/
Disallow:/templates/
Disallow:/userfiles/ 
Disallow:/terms-conditions.htm

Sitemap: http://www.quantumweb.co.uk



All compliant search engine bots (denoted by the wildcard * symbol) shouldn’t access and crawl the content denoted with a disallow command.



Robots.txt Optimisation - continued


You must inform the search engines the location of your sitemap, as shown in the above example. 

You may not want certain pages of your site crawled because they might not be useful to users if found in a search engine’s search results. Alternatively, if your site contains a lot of large images or other media which may not have keyword related naming, then it is wise to avoid having it indexed.

Note that if your site uses subdomains and you wish to have certain pages not crawled on a particular subdomain, you’ll have to create a separate robots.txt file for that subdomain. 

There are a handful of other ways to prevent content appearing in search results, such as adding
“NOINDEX” to your robots Meta Tag, using .htaccess to password protect directories, and using
Google Webmaster Tools to remove content that has already been crawled.



* Please use this form to ask a question about the subject above. If you are requesting a change or have an issue with your website, please use the technical support form