How to control search engines and web crawlers using the robots.txt file

You can specify which sections of your site you would like search engines and web crawlers to index, and which sections they should ignore. To do this, you specify directives in a robots.txt file, and place the robots.txt file in your document root directory.

USING ROBOTS.TXT DIRECTIVES:

The directives used in a robots.txt file are straightforward and easy to understand. The most commonly used directives are User-agent, Disallow, and Crawl-delay. Here are some examples:

Example 1: Instruct all crawlers to access all files

User-agent: *
Disallow:

In this example, any crawler (specified by the User-agent directive and the asterisk wildcard) can access any file on the site.

Example 2: Instruct all crawlers to ignore all files

User-agent: *
Disallow: /

In this example, all crawlers are instructed to ignore all files on the site.

Example 3: Instruct all crawlers to ignore a particular directory

User-agent: *
Disallow: /scripts/

In this example, all crawlers are instructed to ignore the scripts directory.

Example 4: Instruct all crawlers to ignore a particular file

User-agent: *
Disallow: /documents/index.html

In this example, all crawlers are instructed to ignore the documents/index.html directory.

Example 5: Control the crawl interval

User-agent: *
Crawl-delay: 30
In this example, all crawlers are instructed to wait at least 30 seconds between successive requests to the web server.

MORE INFORMATION

For more information about the robots.txt file, please vist http://www.robotstxt.org