Learn why your search page shouldn't be indexed by search engines and how to fix it.

You should always remember, user's input is always malicious. That's an assumption you always need to keep in mind when developing stuff that accepts data from a form in any site of your web project. Thanks to a great article from the matttutt website, I discovered a horrible thing that is happening in Our Code World about how to deal with indexed spam or injected SEO spam content.

After reading and understanding this spam method, I decided to check if Our Code World was being a victim of the injected SEO spam content, for my surprise, the search module was the predilect page for this trick:

Injected SEO Spam Content Our Code World

As you can see, the search page with the spam search indexed about 29,400 registers in Google. The search terms are awful, all of them are related to adult content which is of course not the niche of my blog.

How did this happen?

The search module on the blog allows users to filter by some keywords the content of Our Code World. When someone types something and searchs, the URL https://ourcodeworld.com/search?q=some%20text is generated, so every new search will be theoretically a new page that Google may index. Remember this as it's important, every page with new get parameters, if handled inappropriately will always count as a new page on your website. As I use Google Analytics, so if someone decided to spam such content, Google would end up indexing the page pitifully.

Google probably doesn't want to rank your blog’s internal search results pages, and you’re wasting a lot of Googlebot's energy, if you will, on pages that may not do your blog much good. However, they were indexed anyway and therefore they're harming you someway.

Which are the consequences?

As shown in the image, they insert keywords on the blog, like drugs-related stuff and adult content. This is referred to as the black-hat SEO techniques also known as pharma hack.

Solution in my case

Fortunately, in my case, I don't need the /search URLs of the blog to be indexed on Google, as the search it's supposed to be personal. The first thing to do is to prevent such page from being indexed using the following metatags on the page:

<meta name="robots" content="noindex">
<meta name="googlebot" content="noindex">
<meta name="googlebot-news" content="nosnippet">

And as well, add the X-Robots-Tag header to the response of the search page to prevent the indexation:

X-Robots-Tag: noindex

In some cases, you will need to index the search page, however without the query string. You may allow the indexation and prevent this problem of the injected SEO spam content by using a canonical URL in  your markup:

<link rel="canonical" href="https://ourcodeworld.com/search" />

Cleaning the mess

Now, the only thing that's left is to remove all the injected SEO spam content that is already on Google under my domain name. I use Google Webmasters Console that fortunately allows me to remove such indexed content from the web under the Removals area:

Submit URL Removals Google

In this case, submitting a request to remove all the URLs with the prefix https://ourcodeworld.com/search should remove all the spammed injected SEO on my blog.

Happy coding ❤️!


Senior Software Engineer at Software Medico. Interested in programming since he was 14 years old, Carlos is a self-taught programmer and founder and author of most of the articles at Our Code World.

Sponsors