Google Is Indexing Site-Search Results Pages

Google Analytics is broken (like PageRank is broken), and leaking my data into the index. All the site searches here on SEO ROI are resulting in site-SERPs pages getting into G’s index. How is this happening?

Final Update: This has been disproven as being the source of the site-search-results appearing in Google’s search results. I had good reason to believe that Google Analytics was the source of this (you can see below for my original thoughts on the matter), but there’s now a clarification. My apologies to Google and to my readers for the mistake.

A while back I saw a video about using Google Analytics to track what searches people perform on your site (note: the video seems to be gone, but you can read Avinash’s explanation on how to set up the site search tracking). It seemed like something worth tracking, in order to find out what I might write to meet my visitors need, and perhaps get some keyword research too. So I set it up.

I forgot about the whole thing until I did a site: search on SEO ROI in order to do some research on how the big G was liking my pages. Well lookee here! SEO ROI Site SERPs in Google site: SERPs!

Naturally, I wanted to see whether this was just affecting me. Guess what? Brian Chappell set up the tracking too, and here’s what a site search on brianchappell.com turns up:

Brian Chappell.com's site: search SERPs in Google SERPs| Google Analytics Leaking Data

Google Analytics Leaking Brian Chappell’s site search results.

While if I were a blackhat this would be great – I’d automate bots to search for keywords on my sites, have Google Analytics push it all into the index, then linkspam my SERPs – I’m a white-ish SEO with the coding skills of a luddite. I can do html reasonably well, but programming is waaay over my head.

More importantly: This proves that Google Analytics is integrated with Google’s organic search, and not just AdWords. I remember reading Winning Results with Google AdWords, and Andrew highlighted a debate as to whether or not Google Analytics “gives your sales receipts to the landlord,” to reprise his metaphore. Here we have another problem with Google Analytics, quite evidently.

This is another reason to make the switch to a Conversion Ruler (hat tip for the reference to Andrew), Click Audit for your [very easy to setup and use click tracking - you'll notice me using it instead of Feedburner/Google Analytics for tracking subscriptions], Mongoose for call-tracking, or any of the other analytics tools on the market. On a related note, if you found this material interesting, why not subscribe to my RSS feed ? There’s lots more where this comes from.

Update: Matt Cutts and Mike VanDeMar have anecdotal evidence (which I verified) that this happens on sites not using GA too. Mike’s site had this without ever using Google Analytics, apparently (he’s since developed a neat bit of code to drop those pages out of the index). So GA isn’t necessarily involved. It may, however, be one of several possible causes.

Update 2: Matt emailed me and he “will try” to get a post up on Google’s Webmaster blog about this.

Update 3: Both Google’s official post and Matt’s announcement are now live. Turns out Google itself is doing a new form of crawling, submarine crawling, which includes querying forms and discovering links from Javascript.

Tags: , ,

Comments

  1. Hmmm, very interesting. Wonder how this will be "explained".

    Comment by DazzlinDonna - March 27, 2008 @ 8:40pm
  2. One problem with your conclusion... I had the exact same issue on Smackdown... and I do not have GA set up on it at all. Problem is, I don't know exactly what is causing it. I had asked Matt Cutts about it, and someone else had asked John Mueller on my behalf, because in my case there is suspicious activity surrounding the existence of the pages. As of yet neither has replied. Well, John did, but hadn't looked into it yet.

    Comment by Michael VanDeMar - March 27, 2008 @ 9:28pm
  3. Michael, I had a look at your site and couldn't see any such results. Can you show me some? Perhaps I'm not doing the right search... Interesting to know that you don't have GA set up - I'd be happy to hear that Google Analytics itself isn't broken. That said, I still don't see why Google is showing site SERPs for pages that obviously were never targeted with links etc.

    Comment by Gabriel Goldenberg - March 27, 2008 @ 10:29pm
  4. I can verify that Michael VanDeMar contacted me about the existence of these search result urls on his site on Feb. 26th. Looking at Michaels' site it's clear that Michael doesn't have any Google Analytics code on his site. So that disproves the "Google Analytics is leaking urls into the index" claim. Michael, I think I might know how these search results showed for you and there's a good/non-conspiracy reason. Since you wrote me about it a month earlier and I'd been meaning to write you back anyway (it was starred in my inbox, I promise :), I dropped you a note to find out more details from you.

    Comment by Matt Cutts - March 27, 2008 @ 11:42pm
  5. Gabriel, drop me a quick email please...

    Comment by Michael VanDeMar - March 28, 2008 @ 12:17am
  6. Done :).

    Comment by Gabriel Goldenberg - March 28, 2008 @ 12:40am
  7. Gabs, check your Junk folder... you should have received 2 emails from me by now.

    Comment by Michael VanDeMar - March 28, 2008 @ 1:59am
  8. Good to know it's not GA, but it is curious how this happens...as well as, how does an addon domain get indexed when it doesn't have links to it. (i.e. site1.com/site2 is the foldoer for site2.com and site1.com/site2 gets indexed for no reason). Somehow G is getting this information when it shouldn't be. Would love to know what is sending G the info.

    Comment by DazzlinDonna - March 28, 2008 @ 7:45am
  9. What's in it for Google and it's users? Wouldn't this bloat its index? I know I can't stand when I do a search in Google and I land on a thin affiliate site that happens to have search ads displaying with my search query pre-typed into their directory search box. Do you know what I mean? If I find an example I can post it, but usually what happens is I get to some useless site that's like a really crappy search engine, it shows up in Google serps, there's nothing about my keyword there but there's a bunch of ads. Spam pages, right? Useless to search engine users - UNLESS they click on the Google ads ;-) But I don't think Google means to do this, maybe it's just a bug. Or could it be that this is useful, you don't have an optimized page for a term but you have a lot of content in aggregate and users on your site search for xyz terms meaning that's a good shot your site is relevant for the search - get what I'm saying? Maybe Google is testing whether that theory is true. Trying to match relevance to queries by site topic/visitor profile rather than page content? Scary about the Google-sharing-data thing. Another reason not to opt into Benchmarking either...

    Comment by Linda Bustos - March 28, 2008 @ 10:29am
  10. I can confirm I am seeing this issue as well since I setup Google site search in GA. I'm not sure whether this is a good or bad thing. In analyzing one of my sites, C28.com, G seems to have indexed searches that occur more frequently than most. I seem to remember a while back Google being worried about indexing site searches, due to an unlimited amount of potential duplicate content. I guess they don't care anymore. Great observation Gabriel.

    Comment by Palmer Web Marketing - March 28, 2008 @ 10:44am
  11. Mike, I've responded to your emails. Donna, I'm as anxious as you are to see what's going on! Mike's got a solution if you're interested, so keep an eye on his blog or drop him a line. As to what's in it for G, you make some good guesses Linda. It could be trying to deliver content rich pages and see what the result is for users. That said, many of the pages indexed are for gibberish keywords (weight? advisable?) that don't return great site-SERPs. Half-baked experiment if that's really what's going on.

    Comment by Gab - March 28, 2008 @ 12:31pm
  12. Linda, I'm pretty sure on this one it's not something that G is doing on purpose, and I'm more conspiracy-theory-istic than most you might meet. :D

    Comment by Michael VanDeMar - March 28, 2008 @ 2:37pm
  13. This is not new! It has been around for a couple of months! Google is performing automatic searches, using the intern search function of some CMS (I have only noticed in on wordpress so far). They even perform searches for keywords that are absolutely unique and nonsense (like nicknames mentioned somewhere on your blog) and this internal search pages can rank in the SERP!

    Comment by Malte Landwehr - March 29, 2008 @ 5:40am
  14. So, is anyone ready to say anything yet? What's the cause?

    Comment by Tim Dineen - March 29, 2008 @ 4:13pm
  15. Malte - right you are. Mike noticed this in October, and I wrote about it in a prior Scratchpad which only a few people noticed (I'm telling you guys you need to pay more attention to those columns! Some of my juiciest bits go in there ;D!) Tim, Malte's answered your question.

    Comment by Gabriel Goldenberg - March 29, 2008 @ 10:14pm
  16. No offense to Malte, who I don't know, but I was hoping Matt or Michael would provide details since they seem closest to this issue (and one obviously is closer than all).

    Comment by Tim Dineen - March 30, 2008 @ 10:45am
  17. well, ive been reading on the article, and hope to know whats really the cause?

    Comment by amelia - March 30, 2008 @ 10:58pm
  18. Maybe it's the Google toolbar installed by users who perform searches on your site? The toolbar sends every url to Google, and there are indications that Google toolbar traffic date influences SERPS, e.g. the sitelinks: http://googlesystem.blogspot.com/2008/03/google-sitelinks-using-traffic.html

    Comment by Pascal Van Hecke - April 9, 2008 @ 8:34am
  19. Hi again, Apparently this is what's happening: http://googlewebmastercentral.blogspot.com/2008/04/crawling-through-html-forms.html Google is now crawling forms, by submitting and entering data into the fields, including search forms...

    Comment by Pascal Van Hecke - April 13, 2008 @ 4:18am
  20. Tim, see Pascal's link. Also I've updated the post, Pascal.

    Comment by Gabriel Goldenberg - April 13, 2008 @ 12:45pm
  21. Hey well an interesting post and a serious issue too. Well keeping a track on updates, thanks for letting this post on. Great work

    Comment by Paintworkz Web Design - May 3, 2008 @ 11:39am
  22. I think the problem remains as to where Google is getting its search queries.... If it is getting those search queries from Google analytics, it makes no difference from using Google Analytics data to crawl the site.

    Comment by Sign - August 1, 2008 @ 7:50am
  23. well, ive been reading on the article, and hope to know whats really the cause?

    Comment by perde - December 8, 2008 @ 4:05pm

Leave a Reply