Standard Rules
A robots.txt file is placed in the webroot. In most cases this wil be /wordpress/current/
The content of the file termines the rules for User Agents (webcrawlers).
Different crawlers allow for different rules. Standard Regex does not apply. However some special signs are widely supported;
* = Wildcard
$ = End of URL
The standard notation of a robots.txt file.
User-agent: [user-agent name] Crawl-Delay: [Delay in milliseconds per URL crawl] Disallow: [URL string to exempt from crawling]
Example of a robots.txt
Blocking Seekport from ALL pages
User-agent: Seekport Disallow: /
Crawl Delay Yahoo (Slurp) for 120MS and block the /contact page.
User-agent: Slurp Crawl-Delay: 120 Disallow: /contact$
Disables all PDF files from being crawled.
User-agent: msnbot Disallow: /uploads/*.pdf$
Multiple URL's for a single user agent.
User-agent: Slurp Dissalow: /example/$ Disallow: /contact/$ Disallow: /hidden/$
Multiple user agents. Seperate with an empty line.
User-agent: Ahrefsbot Crawl-Delay: 120 Disallow: /contact$ User-agent: Googlebot Crawl-Delay: 120 Disallow: /contact$ User-agent: Slurp Crawl-Delay: 120 Disallow: /contact$