Skip to main content

Why is duplicate content bad?

Like, the www or non-www issue, any kind of duplicate content on your site could be a hazard to your search engine rankings. Duplicate content in Joomla means any kind of content that is repeated across multiple URLs. Whether this is intentionally done (by repeating content using copy/pasting) or technical duplication (like the www / non-www issue) does not matter to Google. In both cases, it could harm your rankings, because Google does simply not know which one of these duplicated pieces of content is the right one and divide the SEO value between the two URLs, with lower rankings. Of course, you will have to make sure that your content is unique and not copied from somewhere else, or re-used in other parts of your sites, but you will also have to make sure that the same page cannot be accessed through multiple URLs.

Duplicate content in Joomla

A lot of open-source CMSs have possible duplicate content issues, and Joomla is simply one of them. Even when you have SEF links turned on in your Joomla global configuration, the non-SEF URL still exists. This means 2 URLs with the same content, and often there are more. Duplicated URLs can exist because of the following reasons:

  • In Joomla specifically: The same article reached from multiple menu items (which is bad setup from the administrator)
  • www or non-www issues, as discussed in the previous article.
  • Non-SEF URLs are still reachable, despite SEF-URLs being activated, like this:
    /index.php?option=com_content&view=article&id=2 (and many more)
  • Pages ending with index.html, index.php, etc, show the same information as the page without the index part.
  • Parameters in the URL, like ..../page1?font-size=large (more on this in this blogpost: www.searchenginejournal.com/url-parameter-handling-seo/)
  • Trailing slashes
  • Uppercase, and lowercase issues (always create lowercase URLs)
  • Pages with internal navigation like tabs: sometimes each tab has a separate anchor link, like /page#tab1, /page#tab2, etcetera. All these tabs are of course on one page, but Google may see each variant as a different URL. Especially for pages like this, setting a canonical URL (read on) is the advised solution

Having pages reachable from multiple URLs could harm your rankings, so it's best to prevent this. This can be done in many different ways. Some can be used on their own, but you can also combine techniques to totally get rid of your duplicates:

Correct menu set-up

One very common reason for duplicates is if you link one article from multiple menu items. This is a very common thing to do: sometimes an article that is reached from the main menu must also be reachable from a footer menu item. In this case, Joomla builds a URL for both menu items. Let's compare 2 examples:

  1. If you have a menu called Products, with a submenu-item for each product, the URL for the article Chair would be /index.php/product/chair
  2. If you make the same article reachable from the footer, but directly (no submenu), the URL is /index.php/chair

Apart from some small stuff, like a breadcrumb path, or module assignments, these pages are identical and are real duplicate content issues. Partly, this is because of the way Joomla works, but you can work around this in many cases:

  • Sometimes the Main Menu is repeated in a footer position. As long as it is exactly the same, simply publish the Main menu in the footer position again, and don't create a new menu with identical links.
  • Often you really need a menu with different links. In this case, consider not creating a new link, but using a menu item of the type Menu Item Alias (under System Links). This simply takes you to the destination article of the original menu item, no new URLs are created!

With some creativity, you can sometimes even think of more solutions like this.

Use 301 redirects

Anyone serious about SEO will sooner or later have to work with redirects: they are often needed to solve small issues, but sometimes you will have to apply them massively, say, after a site redesign or transfer to another domain.

Using 301-redirects means that you tell anyone who accesses such a URL: This link has permanently (the 301 is used for that) moved, please go here. You can use them to redirect traffic from duplicated URLs to the correct ones. As an example: if somebody goes to:

index.php?option=com_content&Itemid=125&catid=15&id=18&view=article

he is forwarded to:

checklist/avoid-duplicate-url-s

You can achieve 301 redirects either in your .htaccess file or using an extension, like 4SEO, which is a very nice and simple extension for this. More on 301-redirects and how to set up .htaccess for this can be found in the article about re-routing old URLs.

There are other types of redirects, but these are only used for specific cases. An example is a 302 redirect, which is a temporary redirect.

Canonical URLs

Setting a canonical URL can be the solution to tell Google that, even though there are multiple URLs for the same content, there is only one variant that should be indexed. You can read "canonical " as "preferred", that might make more sense to you. If you set the canonical URL correctly, all possible duplicates of a Joomla page have the correct HTML for that in the head section, pointing to the preferred version. As an example, let's look at the page you are currently looking at. It can be reached in 2 ways:

/index.php?option=com_content&Itemid=125&catid=15&id=18&lang=en&view=article
/checklist/avoid-duplicate-url-s

The first URL has a canonical URL that will tell Google that it is the same page as the SEF URL, just check the HTML of the <head> section:

<link href="https://joomlaseo.com/checklist/avoid-duplicate-url-s" rel="canonical"/>

Using this technique, you can prevent having duplicate URLs indexed by Google, even when they are still accessible. 

Currently, in Joomla, canonical tags are hardly supported, which is one of the biggest SEO issues of Joomla IMHO. In Joomla, there is not much you can configure about canonicals. The only option you can set in Joomla is in the settings for the System - SEF plugin. It allows you to set a Site Domain. However, it is only useful if you make the same website available through multiple domains (parked domains), and it is not something you should usually touch. 

If you need to set canonicals (and you know what you're doing), you should use an extension. My personal favorite for creating a correct canonical setup in Joomla is the 4SEO extension. Out of the box it creates correct canonical set-up. By default self-referencing canonicals are excluded, but you can easily switch this on in the Pages settings:

4seo canonical urls

Advanced rules in .htaccess

Using your Joomla .htaccess file you can solve quite a few of your duplicate URL issues (provided URL-rewriting is on). We already discussed how to reroute www and non-www URLs and how to create 301 redirects, but you can also use it to get rid of many other types of issues. Just an example: say your URLs are accessible both with and without a trailing slash, meaning /page1/ and /page1 have the same content. You can mass-redirect the version with the trailing slash to the version without using just a short piece of code:

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)/$ http://%{HTTP_HOST}/$1 [R=301,L]

Again, test if the trailing slash is indeed removed AND whether your site actually still works! Always be careful with .htaccess changes! Similar issues could arise because of parameters, like setting a font size, leading Google to think that 2 different pages exist:

If I need solutions for this, I often go to the Stackexchange forums, there is a lot of useful information there.

Set-up robots.txt (not advised)

Note: this is a technique previously used, but it is not advised anymore: You can set up your robots.txt file in such a way that it disallows any URL with a query string, i.e. a '?' from being indexed, see the article about robots.txt for the code. It both prevents issues with duplicate UR's because of non-SEF URLs, but also real query strings, like these:

The reason it is not an advisable technique anymore is that you simply block access for the Google bots, but Google does see that a URL is sitting there. It just cannot judge anymore whether it is a valid URL or not and may simply index the page, which is not what you want. 

Use an extension

For smaller sites, preventing issues can easily be done by configuring .htaccess, robots.txt, and possibly a small extension for 301-redirects, but for larger sites, using a SEF extension is probably more efficient. It takes some time to learn how these extensions work, so start trying it out on a site that is not that important. If used correctly, it will ban all duplicate URL issues from your site. However, if used incorrectly, it could have the total opposite.
Some well-known SEF extensions:

  • 4SEO by Weeblr
  • PWT SEO
  • Route66 by Firecoders
  • RS-SEO

Check the extensions section of this site for information about these and others.

Google Search Console

Always make sure to register your site with Google Search Console. It will not solve your problems for you, but you will receive feedback about the status of your site, including issues with duplicate content and the setup of your canonical URLs.

Ad - SiteGround Web Hosting - Crafted for easy site management. Amazing Speed; Powerful Tools; Top-rated support. Learn more.