How to avoid website indexing errors

In search engine optimization (SEO), we have numerous tools at our disposal to ensure that the right URL ends up in the index of the major search engines. So-called canonical tags, for example, help us to regulate duplicate content. With meta robots, we can generally approve pages for indexing or exclude them from the index. Other options such as hreflang annotations show Google & Co. which URL the search engine should display in which country for multilingual websites. With the help of the robots.txt file, on the other hand, we can also completely block search engines from our website if necessary.

Avoid indexing errors

The correct use of canonicals, meta robots, Hreflang & Co.

The basics for using such instruments are actually easy to understand. However, if you combine different options with each other, it can quickly become quite complex. If you are also working on a large e-commerce site with lots of paginated pages and filters, things can get really confusing. In this article, we will therefore first look at the correct use of canonical tags and meta robots and how they interact.

Canonical Tags –
Basics & correct use

The canonical tag is a specification in the source code of a website. It refers to a canonical URL – i.e. the version of a webpage that you want to be found in the search engine index. It is mainly used to regulate duplicate content. How duplicate content is created can have many different triggers. One of these can be the use of filters. But content can also be generated twice or created repeatedly due to poor editorial planning.

And what does this mean for my site? If you have two or more URLs with very similar or completely identical content, they compete for the same rankings on Google. The result is that you usually end up with two weaker pages. On the other hand, if you have too much unregulated duplicate content on a website, this is a signal for Google that the page quality is generally rather poor.

The basic idea behind using a canonical is that all copies or duplicates of a page refer to its “original” and thus pass on the positive ranking signals (e.g. PageRank) to it. Only the original URL should therefore be listed by the search engines.

And this is what a canonical looks like:

Meta robots indexing information

The meta robots specifications “index” and “noindex” as well as “follow” and “nofollow” are used in connection with the indexing of pages.

The most common combination found on a standard website is the “index, follow” variant. Search engines should list the page and follow the links on this page so that they come across other relevant pages when crawling. The specification “noindex, follow” is also often used: A certain page should not be indexed, but the linked pages are relevant and should therefore be taken into account by the search engine. However, you should know that Google considers the crawling of “noindex” pages (and their links), which should not appear in search queries, to be time-consuming and rather pointless. If pages are on “noindex, follow” for a longer period of time, they may be evaluated as “noindex, nofollow” according to the current status. They would therefore be worthless for internal linking (an important ranking signal) in the long term.

The specification “all”, which is synonymous with “index,follow”, is also frequently used. The specification “none” is in turn equivalent to “noindex, nofollow”. If nothing is specified, search engines also treat the page as if “index, follow” has been set.

index: The meta robot specification “index” allows search engines to index a URL.

noindex: The meta robot specification “noindex” prohibits the search engine from indexing a URL.

follow: “Follow” indicates that the search engine should follow the links on the page.

nofollow:The links on the page should not be followed.

Areas of application of canonical tags and “noindex”

Both a canonical tag and the “noindex” specification are suitable for keeping pages out of the Google index or removing them subsequently.

The “noindex” tag tells the search engine that you generally do not want a page to be listed in Google’s search results. The canonical tag, on the other hand, tells the search engine that the page in question is a duplicate and that the original page should appear in the index instead.

A canonical tag therefore tends to avoid duplicate content. “Noindex” is used if you do not want a page to appear in Google search results for other reasons. Theoretically, you could also use “noindex” to avoid duplicate content. To do this, all URLs under which a certain content is accessible must be set to “noindex”, except for the URL to be indexed. However, the copies transfer no or only significantly reduced ranking signals to the original. Therefore, a canonical is usually preferable for the regulation of duplicates.

However, one disadvantage of canonicals is that search engines see them as a hint and not as a mandatory instruction. Canonicals are therefore often ignored if the copied page is considered more important due to other ranking signals. “Noindex”, on the other hand, is very reliably observed by search engines.

Do not combine canonical tags and “noindex”

Google currently recommends not combining “noindex” and canonical references to other pages, as Google is then forced to interpret contradictory signals. But what exactly does this mean?

If a page has the “noindex” attribute and at the same time refers to another page using a canonical, Google is instructed via the canonical to treat the two pages as identical. “Noindex” in turn tells the search engine not to include the page with this attribute in the Google index. In theory, Google could then also remove the identical (canonical) page from the index. The “noindex” is transferred from one page to another via the canonical, so to speak.

In general, it is rarely a good idea to leave the interpretation of contradictions to Google, as you cannot predict the exact result. In this specific case, Google will (usually!) assume an error and ignore the canonical. As a rule, the “noindex” is therefore not transferred to the canonical URL. In principle, the desired result is usually achieved with this nonsensical combination of canonical and “noindex”: Only one of the two identical pages is indexed. However, in the case of duplicate content, no positive ranking signals can be transferred from the copy to the original page by ignoring the canonical tag.

The rule of thumb:

If a URL with duplicate content should not be included in the index because there is an original, a canonical is placed on the original URL and “noindex” is omitted.
If the URL does not represent duplicate content but should not be indexed, set “noindex” and omit the canonical.

If you need help with implementation or have any questions about canonical tags and meta robots, please contact us today! Our Loyamo team has the necessary expertise to help you with SEO and other areas of online marketing. If you enjoyed this post on how to avoid website indexing errors, check out our blog for more exciting insights into the world of online marketing.

Avoid website indexing errors

Avoid indexing errors

Canonical Tags –
Basics & correct use

Meta robots indexing information

Areas of application of canonical tags and “noindex”

Do not combine canonical tags and “noindex”

The rule of thumb:

Further blog topics

Google Crawler

Website structure

Webmaster Guidelines

Avoid website indexing errors

Avoid indexing errors

Canonical Tags –Basics & correct use

Meta robots indexing information

Areas of application of canonical tags and “noindex”

Do not combine canonical tags and “noindex”

The rule of thumb:

Further blog topics

Google Crawler

Website structure

Webmaster Guidelines

Canonical Tags –
Basics & correct use