robots.txt

robots.txt is a small, plain text file that’s a part of your website. Its job is to tell the search engine spiders (those bits of software that crawl the web and index your site) what parts of your website they are, and aren’t, allowed look at.

If you’re using a Content Management System (CMS) like WordPress, Typo3, Concrete5, or any of the others on the market there will be parts of your website’s structure that the CMS does not want the spiders to go. 

So far, so good… but what happens if your web developer (or you) make a mistake?

One of the most destructive errors I’ve seen is when a developer sets the robots.txt on a test site to a mode called “no-index”. This effectively tells the search engines not to crawl the site or add it to the index. 

Then, the site goes live… and nobody remembers to change robots.txt.

To the “naked eye” looking at your website through a normal browser, you will have no idea that anything is wrong. But, to a search engine, your site just became effectively invisible. Worse, if you were already indexed, you just told the search engine to remove you from the index.

You can check your robots.txt using Google Search Console or ask your web developer to verify that everything is as it should be.

Remember that a robots.txt issue need not be site-wide; you may find pages or sections of your website that are blocked whilst others are fine.

Be the first to comment

Leave a Reply

Your email address will not be published.


*