The robots.txt file
A spider is an automated program that is used by search engines to find and index the contents of a website.
Spiders will look in a site's root domain for a special file named "robots.txt". The file provides information to the robot (spider) regarding the files and directories that it may or may not index or crawl.
The format for the robots.txt file consists of multiple records. Each record contains two fields, a user-agent specification and one or more Disallow statements. The format is:
The User-agent line specifies the robot name. For example:
You may also use the wildcard character "*" to specify all robots. For example
You can find user agent names in your site's logs by checking for requests to the robots.txt file.
The second part of a record consists of Disallow: statements. These statements specify files and/or directories. For example, the following line instructs spiders that they cannot index the email.htm file:
You may also specify directories:
Leaving the Disallow statement blank implies that the robot can index any files without restriction. At least one disallow line must be present for each User-agent directive to be valid. A blank robots.txt file is treated as if it were non existent.
The following example allows all robots to index all files:
The following example denies access to all robots:
The following example denies all robots access to the cgi-bin and images directories:
The following example denies googlebot access to all files:
For more complex examples, view the robots.txt files from other websites.
The robots.txt file should be created in a plain text editor. htm, .html, .rtf or any file format other than .txt is not acceptable.