Why Google is Broken

Google is broken because PageRank no longer models what it did in Brin and Page’s foundational paper.

The reason that Google’s PageRank algorithm functioned well originally was because it modeled user behaviour, not webmaster behaviour (aka “the web’s democratic nature,” per some self-righteous Wikipedia editor). It determined how likely a visitor was to visit the site, based on linking patterns external to the site. PageRank viewed links as navigation. Today, PageRank views links are”votes,” per Matt Cutts. Links are counted as an editorially-given seal of approval and that makes all the difference.

What are the practical implications of Google’s two conceptions of PageRank?

Different things need to be measured depending on whether links are understood as navigation or seen as votes.

When you keep to measuring probability, such as the probability that a link will send visitors, you can rely on math formulas and computer calculations to do the work. Even the on-page evaluations to determine how relevant a link is to the receiving site can be done algorithmically to a large extent. That’s the way Google used to do business.

Nowadays, the intangibles of intent, sincerity and authenticity are supposed to be measured by Google’s algorithms. The problem is that computer programs are not yet at that level of sophistication that they can understand these items. So as with the recent paid links dustup and notably the -70 penalty attributed to Text Link Ads, manual evaluation is de rigueur. The computer calculations that discount paid links seek to footprint paid links by looking for indicative nearby text such as “Sponsors,” “Ads” etc.

What are the consequences of Google’s broken state?

  • Google’s model is having difficulty scaling, particularly in average-Joe moderately competitive areas where link buys are less likely to be noticed (though some might say that this is therefore less of a problem, that’s mere bigotry from those in the more competitive industries). Paying humans to review intent isn’t as cost-effective as running a computer. Expanding the definition of spam to include paid links means that the system as a whole is taxed with policing something that by definition requires human evaluation for anything mildly sophisticated.
  • I confused myself into thinking that publishing ads aggressively should cut a site’s PageRank since AdSense ads, by definition, send visitors off a site rather than inwards to other pages (tip: use your highest CTR ad copy for anchor text ideas, in order to get the most out of your links). I say confused because ads obviously are not votes, which is how Google supposedly understands links today.

How does Google right itself?

I’m still debating whether to answer this, though I’m certain that the stratospheric IQs of many of my readers have lead some to the conclusion already.

There was a time when I liked Google. When their results seemed relevant on a regular basis. But the sandbox filters have this site firmly stuck at #10 for searches for SEO ROI while both MSN and Yahoo are savvy enough to rank this #1. And Matt Cutts, while claiming to want to improve webmaster communication, is ducking my tough questions (and no, they’re not about paid links – I’m as sick of that useless debate as anyone). (Edit: That’s a criticism of Matt’s behaviour, not Matt himself, whom I respect as a highly intelligent and capable guy. Just so we’re clear.)

(Aside: Sandboxing relevant content through an explicit algorithm or as a side effect of a set of filters has got to be one of the worst ideas ever, incidentally. It’s like saying that Shakespeare’s plays would not have been great until they were recognized as such by external references. This gets things backwards. The plays got recognition for their excellence; they weren’t excellent because of their recognition. Yet that is what the sandbox does.)

Depending on how this plays out in the comments here and on Sphinn, I’ll decide whether to share my idea on how Google goes from being broken to once again having the best algorithm around. (The solution is the same for all the engines, incidentally. Wouldn’t want to make the folks at MSN think I’ve given up trying to help them gain their share of the search market.) I’d also like to hear your answers and ideas on what Google can do to solve its problem. Perhaps you guys have a better solution than I do? I’ll definitely be spreading the link love if that’s the case. Like this editorial? Get my rss feed.

Tags: , ,

Comments

  1. So you're saying it's a good thing to give people with deep pockets even more of an advantage over smaller folks because they can buy more links? I thought this blog was nice and I added it to Google Reader, but this piece of shit post made me unsubscribe again real fast. Did you even think about the shit you wrote or did you just write down whatever came out of your ass? And do you REALLY think it's the SANDBOX that's keeping you at position TEN? That's such an arrogant and moronic statement at the same time it's ridiculous.

    Comment by Icheb - January 2, 2008 @ 6:23am
  2. I'm not saying it's a good thing, which you would know if you would take the time to read first and comment second. Rather just commenting on the impossibility of assessing a person's mindset algorithmically. I'm not at all bothered by the fact that you've unsubscribed from my blog. Someone who makes ad hominem attacks when they disagree with an idea is not welcome in my circle. I think it's evident that the primary difference between Yahoo or MSN's algorithms and Google's is the fact that Google trusts older sites more. It might be arrogant to say that I deserve to be number, and if that's how it came across I apologize. That said, it is logical given that with the name of my company/brand/site "SEO ROI" has turned at least partly into a brand search. I think you're a Google fan who is merely hurt that I pointed out that the Emperor has no clothes.

    Comment by Gabriel Goldenberg - January 2, 2008 @ 3:24pm
  3. Aww jeezz dood… I don’t know if I can stomach another Google theory today. I have already ranted about 2 ‘theories’ floating about. I think the main problem for me is folks that seem to think about the ancient PR algo with no regard to the many layers that have been applied to that core over the years. The SERPs we see are not definitively crafted from PR. It is an onion with PR at the core. As such, discussing how to achieve better rankings by talking about the PR algo is like living in the last century when it was born. But hey, if you want to give it a go… feel free, I would just not be myopic in looking at PR as the fundamental ranking behemoth that is once was, this is no longer the case. BTW…. Friendly freakin fella up there…. Yeeeshhh…. Talk soon (haven’t forgotten my question/answers for ya) Dave

    Comment by theGypsy - January 2, 2008 @ 7:12pm
  4. I remember a time when Google was my only search engine. A few years ago I could find just about anything I needed on the first page. I now find myself needing to Yahoo! some things here and there. I threw up a new site and managed in less than a month to get it in the top five on Google through updating content like a mad man. I slow down a bit as the site matures and the Google ranking slips, but the Yahoo is increasing. Now it's in the top five of yahoo and page 3 of Google. Time to build links tog et it back up. I think Google favors fast moving pages a bit too much. I'm curious to hear your idea to fix it as anytihng I can come up with probably wound't work any better for the average Joe.

    Comment by Paul Dillinger - January 2, 2008 @ 11:08pm
  5. @ Dave: And here I was thinking I'd made a cutting critique. You've delivered a pretty intelligent rebuttal and it pains me to admit that my whole post above might be seriously flawed, given the obvious fact that PageRank is no longer the essential ingredient in Google's algorithm. That said, if you rephrased the title as "Why PageRank is Broken," I'd say my criticism is still valid. And I think that Google's alteration of their evaluation of a "link" is the cause of their own woes. Consider how much of the text link market features crappy sites that no one ever goes to yet can be used to manipulate the algos. Hell, the tide of form-completing spam could probably be eliminated at its core if trackbacks stopped counting for links (when was the last time you clicked one?) and ditto guestbook submissions and items on dead forums. @ Paul: Constantly refreshing content is very important. It gets spiders returning regularly, and it will help you control link flow on your site to get more pages indexed and gain more long tail traffic. As to getting outside links easily, look at the sources listed towards the bottom of this post: http://seoroi.com/algorithms/supplemental-index/

    Comment by Gabriel Goldenberg - January 3, 2008 @ 12:27am
  6. Well we can say ‘broken’ if we want, but either way Google knew the limitations of a PageRanked system and has been steadily working other parameters. There is simply so much to be dealt with though from the engineering dept to the boardroom. There are issues of infrastructure, scalability and plain old trial and error to contend with yada yad yada…. No easy task. Rushing headlong into weighting other systems may just create even more loop holes for spammers to get into and then we’d all be complaining about that. While it (nor little else in this world) is not perfect, I do have faith in many of the directions they, and other search engines are taking things. I think SEOs need to stop fixating on PR as THE ALGO because Google did years ago. Seems many folks are just starting to notice… hehe… Tanx fer the Sphinn BTW…. :0) …. Feel like we’re dancing… ( I better be leading… he he)

    Comment by theGypsy - January 3, 2008 @ 1:00am
  7. "Paying humans to review intent isn’t as cost-effective as running a computer." This kind of takes me back to June when Matt wrote http://www.mattcutts.com/blog/the-role-of-humans-in-google-search/ Strangely enough - Matt comments "PageRank is fundamentally about the hyperlinks that people on the web create. All those people creating links help Google formulate an opinion of how important a page is." What I do believe is that PR needs to be ignored ofr the present - it isnt a dynamic indicator anymore - and I believe that G has another card up its sleeve - which may have something to do with its introduction of KNOL. Thats one area where they have full control over the value of a site - and considering the fact that its going head to head with Wikipedia and Squidoo - I think google has decided its tired of showing content that games its results - they will show their "own" on top results via KNOL where the community votes or whatever on good articles. Basically they haow found a way to scale the human element of search IMHO. Owning the system that hosts the content allows its full control over spam techniques. Vertical integration? I think so.

    Comment by Rishil - January 3, 2008 @ 5:10am
  8. @ Dave the gypsy: Granted PR is no longer the only part of their algo, but I firmly believe that it is still the foundation upon which everything else builds. I'm not sure what you're getting at regarding the infrastructure etc. Implementing new algos is no easy task? Ok, but that doesn't change the fact that their link analysis is screwed up. You can lead the dance all you want - I'm still homecoming King :P! Np on the Sphinn - it was a worthy article. Matt's language there is ambiguous. Somewhere between votes and navigation. @ Rishi - Knol is, imho, an insurance policy. Against people making Wikipedia their starting search engine. Vertical integration - you're bang on. Whether Knol proves the solution to scaling human review, we'll have to wait and see. Certainly it can help them improve quality and control some of the most significant content in their index. Thanks for the bright comments you too!

    Comment by Gabriel Goldenberg - January 3, 2008 @ 11:28am
  9. If by now PageRank is only a Google marketing gimmick, the sizzle on the steak, and no longer figures in any major way in the real keyword search ranking algorithms, who would know? That's my view and I wait to hear from anyone who can prove me wrong.

    Comment by Barry Welford - January 4, 2008 @ 12:16am
  10. Good point Barry. You wouldn't be able to tell if that was the case so I don't see anyone proving you wrong any time soon. That said, when was the last time you saw a site with little/no PR (on the root domain, at least) ranking? It's pretty rare, if we're honest with ourselves. Not necessarily a causational link, but I'd argue there's something to it...

    Comment by Gabriel Goldenberg - January 4, 2008 @ 1:36am
  11. The problem with Google's links = votes setup is that ordinarily in any kind of voting system we have a start and stop period. If we're talking Dancing with the Stars or American Idol, voting is over around 30 minutes after the show. If we're talking politics we start votes at zero in Iowa on Jan 3 and close it off at whenever. On the other hand, Google's voting system doesn't have an end point. It's like if George Bush ran again he doesn't start with 0 votes he start with whatever he received back in 2004 - that's an unfair advantage I'd say, but that's how Google operates. If that's not broken, what is?

    Comment by Halfdeck - January 4, 2008 @ 5:26am
  12. Interesting point. I wonder whether Google's considered this problem of the perpetual poll... Definitely, another argument showing why Google is broken!

    Comment by Gabriel Goldenberg - January 4, 2008 @ 11:46am
  13. I think the most fundemental problem is to believe there is only 1 #1, #2, #3, etc. And this goes along with the never ending vote as above as well. The fact is there are plenty of sites in any given niche or discpline that have worthy content. The fact that most of them aren't seen is only because when you get right down to it, they don't have enough "friends" to give them a vote. A rotating ranking system would be much more logical I believe. There are certainly cases where that probably isn't true, like a search for Pepsi-cola. That should probably return the the home page of Pepsi as its first result. But, If I type "where can I find the cheapest Pepsi", there are a number of sites that could provide that information. Maybe the "strongest" site could only hold postion 1 for up to let's say 50% of the time. Maybe because it is the "strongest" site, it should never fall from page 1, but other "strong" sites are rotated into the top spots. So there is still an incentive to become the "strongest" or at least "stronger". Your site will still gain more visbility than sites that aren't as "strong". But other sites that provide worthy information and content would also be givin visibility. The algorithm could then take into account time on the page, how many pages were visited, etc. to determine if that page is worthy of continuing to receive visibility. If only I had the time to create a search engine.

    Comment by ChrisCD - March 27, 2009 @ 5:01pm

Leave a Reply