Google Public Search: still not the freshest...
You do it, you know you do! Moving aside the gallons of milk at the front to find one at the back of the shelf with a longer expiration date.
I find myself doing that with Google's public search quite a bit. Sure we all use Google's public search... but they STILL haven't sorted out the "dates thing". We first debated this with one of the Google founders back in 2000. Yes, it's hard to do dates "perfectly", I get that, but there's certainly room for improvement, at least tracking when a page was *first* *seen*, so you can tell it's at least N years old.
Context: I did these searches on Tues Jan 15, 2008. After the Iowa cauces and NH primaries, with California to vote in a few weeks. Mac World is today, and the start of awards season in Hollywood.
Check out this lame-ness: (one example courtesy of Miles)
Look for: steve jobs keynote time
Top result: Live from WWDC 2006: Steve Jobs keynote - Engadget
He'll be speaking later today, 1/15/08, but this is from 2 years ago.
Look for: california propositions
Top result: decent, 2007, 2008
Second result is from 2005:
http://www.smartvoter.org/2005/11/08/ca/state/prop/
Look for: election results
First result: Virginia State Board of Elections : View Election Results
But
I'm in California... no disrespect to the fine folks in Va, but their
local elections are probably not what the average surfer is looking for.
Second result: CNN.com Election 2004
3 to 4 years old, amazing.
Look for: java for palm os
First site is pretty good, pointing to the IBM WebSphere site.
But the second result is from 2002! (http://www.javaworld.com/javaworld/jw-05-2002/jw-0531-palm.html)
Search for: new england patriots score
First result OK, but second is from December 2007, and of course they have been playing in the post season since then.
To be fair, Google does get some other items spot on:
Look for: CES
Good, all from 2008
Look for: new season of lost
Good, most are recent
Look for: Iraq
Good, Wikipedia, CIA World Factbook, etc.
Look for: golden globe
Good, mostly points to main web sites.
I will say it again:
Yes, it is difficult when parsing random web text and HTTP headers to know with 100% certainty when the content was authored, for various technical reasons. References to dates might be in regards to discussions of past events, etc.
BUT you can certainly figure out when the first time your spider saw that content. You might not know whether it was authored in February or June of 2007, but when it's 2009 you'll know it's at least 2 years out of date. This isn't quite as easy as it sounds, as the text on web pages change slightly, so raw "checksums" won't cut it. But I'm sure some smart guys from Stanford could figure *something* out.
And when 4 digit years are part of the URL, with other numbers that look like dates, that would often be another good hint.
Web content now goes back more than ten years. All engines need to keep this mind.
hey there.. checkout the relatively new date freshness parameters on GOOG Advance search... here are two blogs that reference some of the details:
Useful Google feature: better date search
http://www.mattcutts.com/blog/useful-google-feature-better-date-search/
Google Updates Date Search on Advanced Search Page
http://www.researchbuzz.org/wp/2007/09/05/google-updates-date-search-on-advanced-search-page/
Stay in touch,
John
Posted by: John Kane | January 15, 2008 at 08:47 PM