SimonCox.com

Version 7.3

How I add canonicals into Perch CMS sites

by Simon Cox
Simon Cox, Author

Topic
Web

Perch Canonicals
The Canonical link in a page's header lets the search engines know where the original page resides. Google tends to choose the oldest version of a page that it can find and any other pages with the same or very similar content are considered duplicates

Canonicals can trip up your sites SEO

Originally conceived for situations where articles were duplicated they would reference the original. Google tends to choose the oldest version of a page that it can find (but not the only method it uses) and any other pages with the same or very similar content are considered duplicates and will not do a well on the Search Engine Results Pages - SERPs and we want our pages to do well there for the traffic.

In most con­tent man­age­ment sys­tems, devel­op­ers tend to take the quick option and ref­er­ence the URL the page is on. To an extent, this works very well but dupli­cate pages can occur by acci­dent / non-design. For exam­ple, if you are using Perch and you decide to pret­ti­fy your URLs by remov­ing the .php you will have set up .htac­cess rules to remove them. But did you decide your URLs should end in a / or not? Search Engines index URLs with and with­out the / as dif­fer­ent pages — hence you can suf­fer from duplication.

  1. http://​www​.exam​ple​.com/​i​n​d​e​x.php
  2. http://​www​.exam​ple​.com/​index
  3. http://​www​.exam​ple​.com/
  4. http://​www​.exam​ple​.com
  5. http://​exam​ple​.com/​i​n​d​e​x.php
  6. http://​exam​ple​.com/​index
  7. http://​exam​ple​.com/
  8. http://​exam​ple​.com
  9. https://​www​.exam​ple​.com/​i​n​d​e​x.php
  10. https://​www​.exam​ple​.com/​index
  11. https://​www​.exam​ple​.com/
  12. https://​www​.exam​ple​.com
  13. https://​exam​ple​.com/​i​n​d​e​x.php
  14. https://​exam​ple​.com/​index
  15. https://​exam​ple​.com/
  16. https://​exam​ple​.com

All the above are essen­tial­ly the same page of con­tent — a home page and the search engines have to work out which one is the orig­i­nal. They are get­ting much bet­ter at this but that’s not a rea­son to help them under­stand your website.

All the above are essen­tial­ly the same page of con­tent — a home page and the search engines have to work out which one is the orig­i­nal. They are get­ting much bet­ter at this but that’s not a rea­son to help them under­stand your website.

For sub­pages, canon­i­cals are more crit­i­cal as the search engines are less like­ly to be tol­er­ant and often they will find your site through links to a sub­page rather than down through the home page. Hav­ing the canon­i­cal auto­mat­i­cal­ly gen­er­at­ed means that any URLs that resolve that you actu­al­ly do not want on the site will include the incor­rect canon­i­cal. If you remove the .php from the URLs, as I tend to do, then you may have sit­u­a­tions where Perch is out­putting links with the .php — the canon­i­cal would then include the .php and cause dupli­cate con­tent issues. Foot­er menus are an exam­ple of where this may happen.

I like to man­u­al­ly add the Canon­i­cal so that I know I am in con­trol but this can lead to issues if an edi­tor mistypes the URL so the tech­nique I use grabs the list of pages from with­in Perch as a drop­down list for the edi­tor to choose from.

Perch field type — Pagelist

You will need to add the Perch field type into /​perch/​addons/​fieldtypes/​— drop the fold­er and its php file in there and you are good to go.

The Perch 2 field type Page list is avail­able from the Perch CMS site. At the time of writ­ing, there is no Perch 3 ver­sion but the archived Perch 2 ver­sion seems to work ok.

Perch tem­plate code

The fol­low­ing code goes into perch/templates/pages/attributes/seo.html

<link rel="canonical" href="<perch:pages id="domain" /><perch:pages id="canonical" type="pagelist" output="pageurl" replace=".php|,/index|" label="Canonical page" help="Please select the page you wish to have as the canonical URL for this page (normaly just choose this page)" required="true" />">

replace=”.php|” removes the .php from the URL.
type=“pagelist” pro­vides the list of pages on your site

On each page in the CMS appears a drop-down box with the pages you have on your site. The edi­tor can select from this list thus avoid­ing man­u­al errors — though they could choose the wrong page so that’s worth checking!

example of dropdown list used in the Perch content management system
example of dropdown list used in the Perch content management system
The out­put code in the head:

https://example.com/my-new-page">

And there is more...

Pag­i­na­tion
Clive Walk­er asked me how do I deal with pag­i­na­tion. Gen­er­al­ly, I don’t as I pag­i­na­tion is the work of the dev­il and adver­tis­ers. There are so many sites who make you click through a series of pages to read an arti­cle — this is just to sell adver­tis­ing, not to make it easy for you to read as usu­al­ly the whole arti­cle could eas­i­ly go on one page and you would scroll down to read it.

There is, how­ev­er, a sit­u­a­tion where pag­i­na­tion is very use­ful — lists of arti­cle entries, cat­e­gories, top­ics and tags. In these sit­u­a­tions, it is rec­om­mend­ed that there is a view all page and that the pag­i­nat­ed pages are canon­i­calised to that, but with huge lists, a view all page is imprac­ti­cal — will take days to load etc. and then the pag­i­nat­ed pages can be self-canon­i­calised. If you want to know more then head over to Deep Crawl’s infor­ma­tion on canon­i­cal­i­sa­tion and pag­i­na­tion.

18 Decem­ber 2017 Update for home page
I have also updat­ed the perch code I used as there was an issue. The home page was out­putting ‘/​index’ so I have added that into the replace state­ment as it was canon­i­cal­is­ing the home page to a URL that didn’t exist — and that is a bad thing! Apolo­gies to any­one who had used the code pri­or to today.


Comments