The ultimate guide to robots.txt

Joost de Valk

Joost de Valk is the founder and Chief Product Officer of Yoast and the Lead Marketing & Communication for WordPress.org. He’s a digital marketer, developer and an Open Source fanatic.

The robots.txt file is one of the main ways of telling a search engine where it can and can’t go on your website. All major search engines support the basic functionality it offers, but some of them respond to some extra rules which can be useful too. This guide covers all the ways to use robots.txt on your website, but, while it looks simple, any mistakes you make in your robots.txt can seriously harm your site, so make sure you read and understand the whole of this article before you dive in.

Want to learn all about technical SEO? Our Technical SEO bundle is on sale today: you’ll get a $40 discount if you get it now. This bundle combines our Technical SEO training and Structured data training. After completing this course, you’ll be able to detect and fix technical errors; optimize site speed and implement structured data. Don’t wait!

Search engine	Field	User-agent
Baidu	General	`baiduspider`
Baidu	Images	`baiduspider-image`
Baidu	Mobile	`baiduspider-mobile`
Baidu	News	`baiduspider-news`
Baidu	Video	`baiduspider-video`
Bing	General	`bingbot`
Bing	General	`msnbot`
Bing	Images & Video	`msnbot-media`
Bing	Ads	`adidxbot`
Google	General	`Googlebot`
Google	Images	`Googlebot-Image`
Google	Mobile	`Googlebot-Mobile`
Google	News	`Googlebot-News`
Google	Video	`Googlebot-Video`
Google	AdSense	`Mediapartners-Google`
Google	AdWords	`AdsBot-Google`
Yahoo!	General	`slurp`
Yandex	General	`yandex`

The ultimate guide to robots.txt

What is a `robots.txt` file?

What does the `robots.txt` file do?

Where should I put my `robots.txt` file?

Pros and cons of using `robots.txt`

Pro: managing crawl budget

A note on blocking query parameters

Con: not removing a page from search results

Con: not spreading link value

`robots.txt` syntax

The `User-agent` directive

The most common user agents for search engine spiders

The `Disallow` directive

How to use wildcards/regular expressions

Non-standard `robots.txt` crawl directives

The `Allow` directive

The `host` directive

The `crawl-delay` directive

The `sitemap` directive for XML Sitemaps

Validate your `robots.txt`

Like this:

Related

Donna French

Leave a ReplyCancel reply

What is a robots.txt file?

Crawl directives

What does the robots.txt file do?

humans.txt

Where should I put my robots.txt file?

Pros and cons of using robots.txt

Pro: managing crawl budget

A note on blocking query parameters

Con: not removing a page from search results

Noindex directives

Con: not spreading link value

robots.txt syntax

WordPress robots.txt

The User-agent directive

The most common user agents for search engine spiders

The Disallow directive

How to use wildcards/regular expressions

Non-standard robots.txt crawl directives

The Allow directive

The host directive

The crawl-delay directive

The sitemap directive for XML Sitemaps

Validate your robots.txt

Share this: