This is the sixth article in a series on the perils of Duplicate Content. We have previously covered the problems of repeating Title tags and Meta tags across your site, using standard product descriptions and bad URL habits.
Having previously examined the problems of duplicate content, both on your site and on other sites, it’s now time to look at how to examine your site to look for duplicate content. Or, more importantly, does Google think that you have duplicate content!
Canonical Issues
Canonical issues most commonly present in allowing the WWW and non-WWW versions of your site URLs to serve identical content. If your site is located at www.myflowershop.com, try typing just myflowershop.com into your browser. If your site is configured correctly, you will see the page redirected to www.my flowershop.com (or, the WWW version will redirect to the non-WWW version). If neither URL redirects to the other, you have duplicate content.
Another cause of canonical problems is having a secure section of your site for collecting sensitive personal information from your customers. Secure URLs being with “https” whereas normal pages will begin with “http” only. Try visiting https://www.myflowershop.com – if the URL is not redirected to the http version, you have a duplicate content problem.
Search Thy Self
There are a few advanced search parameters you can use in Google to identify duplicate content on your own site. Using the “site:” command, we can instruct Google to return search results only from one site.
- Start at the Google home page: www.google.com
- In the search box, type “site:myflowershop.com” (using your domain name) and click the Search button.
- Scroll down through the results from your page and look for the following indicators:
- Do all the results have the same title?
- Do all the results have the same description?
- Do you see only one result?
- Do you see the message: “In order to show you the most relevant results, we have omitted some entries very similar to the (x) already displayed.”
These are all indications that content on your site is being filtered, or at risk of being filtered, as duplicate content.
Clone Wars
Duplicate content on different sites happens for a variety of reasons. Sometimes an author might syndicate content, some sites aggregate content from a number of sites; and some spammers build “scraper” sites that swipe content from other sites. In the floral industry, duplicate content on different sites is most commonly caused by template sites (ex: sites from FTD/TF/Media99/lots of others) that come with stock text and pages. Florists don’t bother to update this generic content and the result is 20,000+ florist websites that are virtually indistinguishable from each other. It’s a lousy user experience for the customers, and Google knows it. That’s why they filter duplicate content!
To search for other sites with the same content as your site, try this:
- Browse to a page on your site. In this example, we’ll use the About Us page from a flower shop in Texas that is using a TF template site.
- Highligh a string of text – at least one complete sentence. Right-click and choose “copy.”
- Go to www.google.com and put two quotes (“”) inside the seach box.
- Right-click in between the the quotes and choose “paste.” You should wind up with the copied text appearing between the quotes. This tells Google to search for the exact string of text, not just pages that contain those words in any order.
Added bonus: Searching with the quotes will cause Google to search in their data archive, not in their search index. This means the results will include pages that might not have been indexed to appear in the regular search results.
- Click the Search button.
- The first search result we see has only four results – not a one is the site where we copied the text! Scroll down to the note from Google that they have omitted similar results and click “repeat the search with the omitted results included.”
-
Now we see that Google is reporting approximately 50,600 matches. That number may be accurate, or subject to some estimation – we’ll never know as Google will only display the first 1,000 results for a search. The point is that over 50,000 sites with the same content only four are deemed worth displaying by Google.
You can repeat these steps with different pages from your site – product pages, About Us, policies, guarantees, every section of your site. If you are finding your pages filtered out – omitted – by Google, you have a serious problem.
Stick around – or better yet, Subscribe to this blog, to learn some ideas on how to get unique content for your site.
Bonus: Fun with duplicate content! Want to find out how many Teleflora sites there are in your city? Try this nifty little search that uses some advanced Google commands. Florists in Brooklyn, NY using TF template sites. Follow the link and edit the query to replace the NY and Brooklyn with your city and state.
I think that the duplicate content issue may make the use of WordPress sem to be even more relevant. In fact, WordPress takes care of the canonical issues automatically.
However, as you pointed out, duplicate content is more than having canonical issues.
Today, I think that Google’s new updates may lead webmasters to be more cautious. I would not be surprised if it was shown that people used less duplicate content now on their main websites.
I’ve been using the ““site:myflowershop.com” type of search from year one of my internet marketing journey. It is a wonderfully simple way to see interesting things about the content on your site, how many of your pages are in Google’s index and so on.
Something I did not know was that “searching with the quotes will cause Google to search in their data archive, not in their search index. This means the results will include pages that might not have been indexed to appear in the regular search results.” I had no idea about this even if I’ve been searching with “” all this time.