With the recent demise of the robots.txt method for implementing noindex tags, we thought it would be a great time to revisit this very useful HTML tag, and how it can be properly used to help keep control of your website’s indexed pages in Google.
What is a noindex tag?
A noindex tag is a piece of code that tells Google, and most other search engines, to not include the specified page in search results.
Why would I need them?
There are many reasons why a website might not want certain pages showing up in search. For example, content created purely for users, spammy forum content that may be hurting the main site, auto-generated pages from the content management system that are blank and serve no purpose, and so on. Google analyses sites on both a page level and a domain level, and although it’s not too much of a problem on an individual basis when pages shouldn’t be indexed, things can snowball quickly when thousands of low-quality, thin-content or spammy pages get indexed by Google; this is very likely to decrease the rankings of the site in question.
The Yoast indexing bug, where thousands of WordPress sites had their HTML pages consisting just of an image getting indexed by accident, causing large increases in the number of ‘dead weight’ pages they had relative to their main site content, is one of the most famous examples of this, and caused substantial problems at the time.
Let's work together
If you want to talk to our specialist team about how we can help you with your digital marketing, talk to our team today.
How do I implement them?
There are two main ways of implementing noindex tags:
The <meta> tag
Placing the following code snippet in the <head> section of a page’s HTML code (before the main body section) instructs search engines to not index the page:
<meta name=”robots” content=”noindex”>
This can easily be done on an individual page level either manually or through most Content Management Systems or plugins (Yoast on WordPress for example).
The HTTP response header
For a more technical solution, the server headers of a page or given set of pages can be configured to serve the noindex tag when the page is accessed, depending on the site/CMS setup in question. This may be easier to implement than meta tags and has the added advantage of not requiring the page’s HTML code to be edited.
An example of a server response correctly serving an X-robots tag is below (credit to Google)
HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
The application of noindex tags on your pages should ensure that those pages are removed from Google’s index (as it’s classified as a directive which must be obeyed, not a suggestion like a canonical tag) but this may not be instantaneous. Google has to revisit (crawl) a webpage before it can update its internal systems, so it may take a little while for your changes to pass across the pages that the noindex tags have been applied to. This can be sped up with the (ironically named in this case!) request indexing tool in Google search console, or the pages could be added to a temporary XML sitemap and removed from the main site’s XML sitemap, which should help to notify Google of larger numbers of amended URLs; doing this can also function as a handy diagnostic tool.
It’s also worth bearing in mind that telling Google not to index a page won’t stop them from crawling it and following links on the page (the nofollow attribute can be used if this is not desired) and the content in question may still be visible on other websites if they copy or syndicate the page’s content.