Standard Rules

A robots.txt file is placed in the webroot. In most cases this wil be /wordpress/current/
The content of the file termines the rules for User Agents (webcrawlers).
Different crawlers allow for different rules. Standard Regex does not apply. However some special signs are widely supported;

* = Wildcard
$ = End of URL

The standard notation of a robots.txt file.

User-agent: [user-agent name]
Crawl-Delay: [Delay in milliseconds per URL crawl]
Disallow: [URL string to exempt from crawling]

Example of a robots.txt

Blocking Seekport from ALL pages

User-agent: Seekport 
Disallow: /

Crawl Delay Yahoo (Slurp) for 120MS and block the /contact page.

User-agent: Slurp
Crawl-Delay: 120
Disallow: /contact$

Disables all PDF files from being crawled.

User-agent: msnbot
Disallow: /uploads/*.pdf$

Multiple URL's for a single user agent.

User-agent: Slurp
Dissalow: /example/$
Disallow: /contact/$
Disallow: /hidden/$

Multiple user agents. Seperate with an empty line.

User-agent: Ahrefsbot
Crawl-Delay: 120
Disallow: /contact$

User-agent: Googlebot
Crawl-Delay: 120
Disallow: /contact$

User-agent: Slurp
Crawl-Delay: 120
Disallow: /contact$

Did you find it helpful? Yes No

Support

How can we help you today?

Creating a robots.txt for webcrawlers Print

Standard Rules

Example of a robots.txt

How can we help you today?

Creating a robots.txt for webcrawlers Print

Standard Rules

Example of a robots.txt

Related Articles