WordPress robots.txt Example

February 5, 2015
Comments Off on WordPress robots.txt Example

By Joost de Valk

WordPress Robots.txt suggestions by Yoast

The robots.txt file is a very powerful file if you’re working on a site’s SEO, but one that also has to be used with care. It allows you to deny search engines access to certain files and folders, but that’s very often not what you want to do. Over the years, especially Google changed a lot in how it crawls the web, so old best practices are no longer valid. This post explains what the new best practices are and why.

Google fully renders your site

No longer is Google the dumb little kid that just fetches your sites HTML and ignores your styling and JavaScript. It fetches everything and renders your pages completely. This means that when you deny Google access to your CSS or JavaScript files, it doesn’t like that at all. My recent post about Google Panda 4 shows an example of this.

To see whether your site can be fully rendered, you can do a Fetch & Render in Google Webmaster Tools crawl section (as it happens, we’ll do a post on that tomorrow in our series on Google Webmaster Tools, so check back then).

This means that the old often heard best practice of having a robots.txt that blocks access to your wp-includes directory and our own old best practice of blocking your plugins directory are no longer valid. This is why, in WordPress 4.0, I opened the issue and wrote the patch to remove wp-includes/* from the default WordPress robots.txt.

Robots.txt denies links their value

Something else is very important to keep in mind. If you block a URL with your site’s robots.txt, search engines will not crawl those pages. This also means that they cannot distribute the link value pointing at those URLs. So if you have a section of your site that you’d rather not have showing in the search results, but does get a lot of links, don’t use the robots.txt file. Instead, use a robots meta tag with a value noindex, follow. This allows search engines to properly distribute the link value for those pages across your site.

Our WordPress robots.txt example

So, what should be in your WordPress robots.txt? Ours is very clean now. The only thing we still block is our /out/ directory for our affiliate links, as discussed in this post. We no longer block our /wp-content/plugins/ directory, as plugins might output JavaScript or CSS that Google needs to render the page, nor do we block our /wp-includes/ directory, as the default JavaScripts that come with WordPress, which many a theme uses, come from these directories.

What you should do with your robots.txt

You should log into Google Webmaster Tools and under Crawl → Fetch as Google, use the Fetch and Render option:

Fetch and Render in Google Webmaster Tools to test your WordPress robots.txt.

If it doesn’t look like what you’re seeing when you browse your site, or it throws errors or notices: fix them by removing the lines that block access to those URLs from your robots.txt file.

This post first appeared on Yoast. Whoopity Doo!

Source:: SEO