« Attivio sponsoring SearchDev dinner at ESS NY | Main | SharePoint sure seems strong »

April 21, 2009

Web Search and Dates, again....

Miles likes to brag that our first argument with the Google founders goes back to 2000/2001, about dates, and he's blogged out date issues before.

I was reminded of this old argument today.  We had a power glitch in Cupertino a little while ago, so I went online to see if there was any news - after Twittering about it of course!

Google brought back a top result with a promising title... from 2004:
http://www.sfgate.com/cgi-bin/article.cgi?f=/g/archive/2004/10/01/power01.DTL

Whatever... the other search portals had the same or similarly old results, so I'm not gonna site and bash Google.  In all fairness this is a tough problem to fix 100%, and all search engines have issues with bad dates.

AND, if no paper or blog talked about it, then there's nothing to "find" anyway.  Google can't find what isn't there.

Going to twitter search, I found my own posting, and then some guy asking about driving times to Cupertino.

But freshness of content remains a problem.

I have a compromise suggestion for "the powers the be".  How about, when a spider makes a guess as to the proper date of a story, that it also add a "confidence" to that.  For example, if the URL encodes a date, then I'd call that high confidence.  Or there's a newspaper byline.  At the other end is the when a web server gives the current date and time every time a page is fetched - so clearly not connected to the content, so a lower confidence.  And then some default weight, the first time a spider encounters a piece of content.

TrackBack

TrackBack URL for this entry:
https://www.typepad.com/services/trackback/6a00d8341c84cf53ef011570363f1a970b

Listed below are links to weblogs that reference Web Search and Dates, again....:

Comments

Even if a page doesn't have a date a spider can derive a "bare minimum" date based on the dates of other pages that link to it.
===
Great point Mark! We often point out that 'first seen on' date is good, but 'date of first link into' is a good hint as well. Often spiders do not maintain much in the way of page to page state, but those that do can certainly can add this as a validation!

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.