Do invisible characters in your code cause issues with search?

html code showing red dots where there is a hidden character and behind this to the left and right are near invisible shady characters in trench coats.

A "Hang on a minute" moment - what are those?

Today I ran a crawl on my site using Sitebulb. and I was reviewing a medium issue by delving down into the source code that Sitebulb had collected. There were red dots all over the content - what were these I thought? I had my suspicions.

Screenshot from Sitebulb Live html view of one of the URLs showing invisible characters as red dots littered throughout the content.

I have come across invisible characters before

A long time ago, in a galaxy far far away, I had come across an issue when we had uploaded content that had been created in Microsoft Word into our new CMS. Word had been adding invisible control characters in to the content. You could not see these on the page but in the source code they rendered as Å - so we called them evil Å’s because they were a pain and caused upsets. It meant that we changed our content building processes to remove unseen control characters before they go to end stage and set guidelines for content creation.

Since then I have been unconsciously aware of unseen characters in code and the problems they can cause, so I was very surprised to find them in my own websites code - must be losing my edge. I’m not sure how or when they got in there as this site has been through a few rebuilds in its time but the new posts I have written since the migration don’t have any. When I grabbed the content out of the last CMS I scraped the site - it was the easiest way as there was no export function at the time in that CMS, as good as it was. Not sure the scrape added them so they might have been from a previous build out.

How I fixed the invisible characters

The fix was reasonably easy. I copied the character from the Sitebulb code, shown as a red dot, and in my code editor, Visual Studio Code, I did a search and replace on one of the articles and checked it.

Visual Studio Code showing the found invisible characters ready for replacement.

All was ok so I went through about 36 others one by one - though I think I could have done them all in one go. Fortunately it only affected my main article posts, not the new short articles, and none of the new posts I had created since the last migration.

Visual Studio Code showing the found invisible characters ready for replacement.

Testing to see if this makes a difference in the SERPs

I then published the site and submitted all the changed pages to GSC URL inspection so they get crawled. I then set up a test on SEOtesting.com. to compare the next two weeks of GSC data on these URLs with the previous two weeks. If there is a significant change then my conclusion will be that these invisible spaces are breaking up the words so that they make no sense to Google. If it makes no difference then the conclusion will be that Google ignores these characters even though they break up words in the content.

My gut feeling is that this will not make a difference as I am sure that the Google engineering team will have seen similar issues many years ago and will strip these characters out from the content during indexing or analysis to prevent issues because this is going to be fairly common problem.

Anyway, I will update this post after the test has completed.

By Simon Cox |

Next post: I build a new log store with plan and materials list

Previous post: Build your own 11ty starter

If you would like to keep up to date with my musings, I do have a handy rss feed!

Featured articles

Latest articles

Or all the articles

Latest Shorts

Or all the short articles