Matt Cutts Fumbles Tweet – Penguin 2.0 Leaked

handcookieI spend a lot of time thinking and even more time testing. In the world of SEO, you can never rest – especially these days.

The folks at Google seem arrogant at times, and with good reason. For years they’ve gotten away with so many lies time after time and pulled the wool over the eyes of thousands of people that should know better.

If you know that you could tell someone a lie and they wouldn’t question it then there is a lot of room for temptation there, and they’ve put their hand in the cookie jar time and time again.

Up until recently, the biggest lie they told was concerning link building and how they were treating it. For years they convinced people that link spam couldn’t work, wouldn’t work and shouldn’t work.

Meanwhile thousands of talented blackhat SEO’s laughed in their face and made enough money to fill Fort Knox 3 times over.

With that having been said, I’m almost certain I just caught Matt Cutts with his hand in the cookie jar yet again.

My Research Leading Up To Yesterday

Penguin 1.0 rolled out and knocked some of my sites out of the mix. Well played Matt.

I rebuilt smarter and more quickly. The Penguin refreshes came and I wasn’t affected. (I’ll write a post on the metrics I believe are important at a future time.)

Penguin 2.0 rolled out and ‘they got me’ again. Not all of my sites, but a large enough portion to irritate me and make me curious. I had figured out Penguin 1.0 the first time around, why not tackle this?

I began looking over the obvious factors – anchor text ratio, link distribution, link sources, link velocity, tiered link mass, etc.

Note: Tiered link mass is a easy calculation. It is simply the sum of all links below tier 1. Don’t over-complicate it. It’s just what it sounds like…

But when looking at all of these various factors, I still couldn’t come up with an answer that satisfied me.

So I did what I always do. I bought new shared hosting accounts on fresh IP’s (in case the algo is IP related). I bought 40 new domains to have plenty of room to test without having to report into the registrar every time I want to try something new.

Initial Conclusions

bigsitesspamPenguin 1 was almost purely centered on Anchors and Commercial Intent after the fact. Penguin 2 (or subsequent algo updates) was much more efficient at handling new inbound link spam. To that I say, well done Matt.

I had difficulty getting the small domain/link spam model to work overall. Frustrated, I changed directions and began to examine on-site metrics.

Obviously, small domains weren’t going anywhere. So I began to analyze larger domains.

As testing would later prove, larger domains – domains that had 20 or 30 pages a day added to them could absorb link spam much more readily and were ranking.

Further more, domain size was also pulling certain domains out of the grave – *BUT* – only domains built after Penguin 2.

The evidence had become clear, domain size was now an extremely relevant anti-spam metric Google had taken targeted action against.

I was able to repeat my success multiple times – and success in both ways. Pulling old domains out of the grave and ranking new domains. I wasn’t using the ‘old way’ of link spamming anymore, but a more evolved methodology to further simulate what Google might like to see.

What Does That Mean?

Simply put, previous to Penguin 2 we would build tiers and blast the tiers. It worked, and I’ve still some some success with it but not with the same regularity as before.

So I evolved my approach to be less static. I would create seemingly predictable spikes in inbound link velocity immediately after publishing content and if I didn’t publish content for awhile, I didn’t build links for awhile.

This to me seems to be the more natural simulation of a real event.

 The Common Sense

To me this all seemed like a logical train of events, even if I couldn’t pin down the exact metrics or threshold used.

Let’s look at the internet ‘In The Wild.’

When a new story is published email subscribers, rss feed subscribers, fan pages, etc are all normally notified. This produces a huge amount of inbound traffic to that story, which will often be followed by immediate sharing meaning:

  • Social Signals/Likes/Tweets
  • Links From Bloggers, Journalists, Sister Publications

For example – when Matt Cutts tweets something it get’s posted on blogs like Search Engine Land, talked about on forums like SEOSUnite, and often times retweeted several hundred times.

But after that initial burst that normally lasts 2-3 days, the tweet is only rarely brought up.

Looking at that in terms of link velocity, it builds, peaks, and normally quickly falls off with very little following that.

Take that in stark contrast to how we would link spam. Build tiers, blast them for days, weeks, months – constantly, and at high velocity.

But You Can’t Just Hammer Everyone

hammersitesRemember that big sites are going to experience regular, high velocity inbound links because they are normally regularly posting stories and content that get people excited and sharing.

In effect, because they are constantly publishing new content, people constantly have a reason to be linking to that domain. There’s stimulus that creates the atmosphere for it.

So Google couldn’t blindly punish all domains that run at high inbound link velocity all of the time.

By contrast, a smaller domain with very little new content doesn’t create any kind of required stimulus for a large volume of new inbound links.

It’s the same old information that’s been around for weeks, months or years. Why would it be getting 100,000 links a day for months on end? It wouldn’t. Unnatural.

Well Done Google

This was a well thought out course of action and created some very simple metrics for them to analyze to determine if a page/domain should be punished.

Big Table‘ (additional) is what I believe Google is still using as their means of managing data. The best explanation of BigTable I found is here: http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdf

While that might seem like a lot of reading to get done if you want to see what Google has to work with, you have to look at how they store and manipulate their data.

Approaching the problem of link spam, BigTable provides Google with easily referenced material to conduct the site and link analysis that I’m proposing they do.

Mind you, this link<->content/age metric is only an additional layer on top of others, such as the anchor and commercial intent metrics from Penguin 1.

So in my opinion, it seems as if they came up with a simple solution to a complex problem – and for that they do deserve some recognition, even from the ‘other side.’

Additional Signal Created

Around the same time that Penguin 2.0 was released, Google began heavily favoring sites with a lot of new content.

In fact, they began to take it too far as referenced in: http://www.viperchill.com/new-seo/

So fresh content and large domains now mean a lot to Google apparently… Let’s put it to you this way, they mean enough to Google that they will both get you ranked regularly AND will even pull you out of punishments.

So Why Did I Write This?

-IMPORTANT-

baddataI hate, and I mean HATE algo analysis posts. I cringe every time I see them, but read every one I come across.

First of all, most of the posts out there seem to use poor data and not from the standpoint of it normally just begin incorrect to begin with – but they never go out there and then TEST their conclusions. The just talk about the past and are too scared to address the future.

Making matters worse, most analysis posts are done on the websites of vendors who will then pitch you on how their service ‘Avoids These Problems.’ In other words, they have a vested interest in fitting their conclusion into their product niche.

BUT IF THAT WASN’T BAD ENOUGH…

You get some analysis posts that use ‘hard’ numbers as the basis for their exploration.

Look – you can’t use hard numbers in SEO analysis. You just can’t.

Well, I suppose you can because people do it – but it makes them look like assholes. WHY?

Because nobody has the exact same data Google is working with. They are normally extracting their data from a variety of much weaker crawlers.

In other words, you can’t conduct meaningful analysis based on numbers that are wrong to begin with.

But people do it, and others on forums link to it and share it, fight over it and it’s all for nothing. The data was wrong before the analysis even started!

The Point Is

I hate algo analysis posts. I swore I would never write one – and I didn’t for years.

But Then Google Thought They Could Be Clever

See, people haven’t been talked about Penguin 2.0 with any regularity for several weeks now – maybe even months depending on the forum.

The buzz has died down and been drown out by other less meaningful banter. Just look on any forum, how often is Penguin 2 or the Payday algo referenced in new forum posts?

Virtually never…

Yesterday, I saw this: https://twitter.com/mattcutts/status/372801217727979520

clevermatt

I Lost It…

Having gone through as much testing and analysis as I had only to come to the conclusions I was able to draw about domain size being a meaningful metric, Matt tweeted that!

It was obvious to me….

Google is looking to further refine the system of punishment that they were so cautious to not allude to immediately following the major anti-spam updates.

Of course, they couldn’t crowd source this close to the updates because they would have tipped their hand to all of the SEO’s that domain size was now a relevant factor in determining if your site would suffer a link related punishment.

So they waiting until the buzz died down.

Well played.

Unfortunately while they were waiting for the buzz to die down some of us were working on the problem they presented to us with the latest updates.

mattbusted

I Wouldn’t Have…

I wouldn’t have published the rough outline of my findings had Google not gotten what I feel is arrogant, again.

They have absolutely no need to crowd source information like this with all of the information under their control.

It’s intelligent, because it means they don’t have to comb through it all to find examples they can use to refine their process by – but it’s insulting at the same time because their initial actions created the conditions under which those smaller domains don’t rank.

To Matt

We should talk.

Special Thanks

I did want to take a moment to say thanks to a highly influential blogger in particular. Matthew Woodward has really been getting me more interested in SEO/Internet Marketing based blogging lately and it’s that ‘push’ that started pulling me out of my forum-shell and get me more involved with GODOVERYOU.COM - So thanks Matthew :)

Take it easy guys,

GOY