« Microsoft-FAST: A kinder, gentler takeover? | Main | Is hosted / managed search behind the Microsoft FAST acquisition? »

January 15, 2008

Google Public Search: still not the freshest...

You do it, you know you do!  Moving aside the gallons of milk at the front to find one at the back of the shelf with a longer expiration date.

I find myself doing that with Google's public search quite a bit.  Sure we all use Google's public search... but they STILL haven't sorted out the "dates thing".  We first debated this with one of the Google founders back in 2000.  Yes, it's hard to do dates "perfectly", I get that, but there's certainly room for improvement, at least tracking when a page was *first* *seen*, so you can tell it's at least N years old.

Context: I did these searches on Tues Jan 15, 2008.  After the Iowa cauces and NH primaries, with California to vote in  a few weeks.  Mac World is today, and the start of awards season in Hollywood.

Check out this lame-ness: (one example courtesy of Miles)

Look for: steve jobs keynote time
Top result: Live from WWDC 2006: Steve Jobs keynote - Engadget
He'll be speaking later today, 1/15/08, but this is from 2 years ago.

Look for: california propositions
Top result: decent, 2007, 2008
Second result is from 2005:
http://www.smartvoter.org/2005/11/08/ca/state/prop/

Look for: election results
First result: Virginia State Board of Elections : View Election Results
But I'm in California... no disrespect to the fine folks in Va, but their local elections are probably not what the average surfer is looking for.
Second result: CNN.com Election 2004
3 to 4 years old, amazing.

Look for: java for palm os
First site is pretty good, pointing to the IBM WebSphere site.
But the second result is from 2002!  (http://www.javaworld.com/javaworld/jw-05-2002/jw-0531-palm.html)

Search for: new england patriots score
First result OK, but second is from December 2007, and of course they have been playing in the post season since then.

To be fair, Google does get some other items spot on:

Look for: CES
Good, all from 2008

Look for: new season of lost
Good, most are recent

Look for: Iraq
Good, Wikipedia, CIA World Factbook, etc.

Look for: golden globe
Good, mostly points to main web sites.

I will say it again:

Yes, it is difficult when parsing random web text and HTTP headers to know with 100% certainty when the content was authored, for various technical reasons.  References to dates might be in regards to discussions of past events, etc.

BUT you can certainly figure out when the first time your spider saw that content.  You might not know whether it was authored in February or June of 2007, but when it's 2009 you'll know it's at least 2 years out of date.  This isn't quite as easy as it sounds, as the text on web pages change slightly, so raw "checksums" won't cut it.  But I'm sure some smart guys from Stanford could figure *something* out.

And when 4 digit years are part of the URL, with other numbers that look  like dates, that would often be another good hint.

Web content now goes back more than ten years.  All engines need to keep this mind.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c84cf53ef00e54fe00bc88833

Listed below are links to weblogs that reference Google Public Search: still not the freshest...:

Comments

hey there.. checkout the relatively new date freshness parameters on GOOG Advance search... here are two blogs that reference some of the details:

Useful Google feature: better date search
http://www.mattcutts.com/blog/useful-google-feature-better-date-search/

Google Updates Date Search on Advanced Search Page
http://www.researchbuzz.org/wp/2007/09/05/google-updates-date-search-on-advanced-search-page/

Stay in touch,
John

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.