The robots.txt file is uploaded to your website's root folder. This file guides searchengine spiders by allowing or disallowing crawling of specific files and folders. It´s a URL blocking method and should handled with care.
Example:
User-agent: Googlebot Disallow: /folder1/ Allow: /folder1/myfile.html Sitemap: http://www.yoursite.com/sitemap.xml
The user-agent can be a wildcard * so all spiders/bots are affected. User-agent: *
In the example above we disallow the indexing of 'folder1' except for one file in that particular folder: 'myfile.html'
A good robots.txt for a site running on WordPress would be this:
User-agent: * Disallow: /wp-admin Disallow: /wp-includes Disallow: /category/*/* Disallow: */trackback
The WordPress core files are protected from indexing and also category and trackback pages won't be listed. A lot more can be added to the robots.txt file but this pretty much describes the most important facts. Note: Do not block your /feed/ url as this can be used as a sitemap
You can also try the robots txt file generator
ROBOTS.TXT CHECKLIST
- Add your sitemap url to the robots.txt file: Sitemap: http://www.yoursite.com/sitemap.xml
- If you're using WordPress disallow the core folders
- Is it named properly (case sensitive!) and placed in your root folder?
- Disallow 301\302 redirections and cloaked urls (i.e. yoursite.com/outgoing/affiliate-offer) >> Disallow: /outgoing/*
- If you are using subdomains each subdomain needs its own robots file
- One rule per line
THE FILE IS READY - WHAT'S NEXT?
- Once you uploaded the file to your website's root folder you can test it with Google's robots testing tool
ROBOTS.TXT VS. META ROBOTS
It`s recommended to exclude specific pages via <meta name="robots" content="noindex"> instead of blocking the file with robots.txt. If the url in question gets backlinks from other pages the link juice is lost because robots.txt blocks the spiders. The meta tag still follows links and rewards your page.
If you want to exclude complete folders i.e. /tmp/ /private/ or similar it makes sense to add them to robots.txt