« New Idea Engineering Helps Orange County offer Residents Innovative Enterprise Search Technology to Community Web site | Main | Which search engine is cuter? »

September 11, 2008

Google, dates, and UAL

In case you didn't hear, UAL stock tanked over the last two days. The Wall Street Journal reports that the parent company for United Airlines saw its share price fall from $12 a share at Monday's opening to a low of $3 a share by Tuesday afternoon.

The cause? A Google 'news' story reporting an impending bankruptcy filing for UAL.

Only there was no impending bankruptcy: Google picked up a story in the south Florida Sun Sentinel reprinted from the Chicago Tribune - dated December 2002. Today they have an interesting description of the sequence of events that lead up to the fire sale on UAL shares.

We've written before about the problem with information systems like Google Alerts. If you only look for a story new to your crawler, you risk believing you're discovered something new when in fact it's ancient. Verity, in its Topic Real Time, addressed these problems in the late 80s, but Google's philosophy that dates are not really important leaves us open to more such stories, more such panics, and more such headlines.

When you're dealing with spiders and freshness, it's never been sufficient to trust the web server date. But spidering technology can parse the article, looking for datelines and the like. And, if you've indexed the world's content, you can certainly look in your archives to confirm that the story isn't virtually identical to a story you found years before.

In your organization, you can have your spider generate a checksum 'fingerprint' to identify new content from new postings. You can also store a 'first seen on' date so you can identify new content, even on those systems where the web server lies about the freshness.

The end of the WSJ article tells a frightening possibility. To quote:

"Amid serious storms in Florida and on the East Coast, Web surfers checking for news about travel delays may have stumbled onto the old UAL story by mistake, and a small number of fresh hits may have been enough to drive it onto the list. A Tribune spokesman declined to say how many hits the article received but said there was no indication of fraud."

Well, good. NO fraud, just one of those things. Still, if this is all it takes, a small group of hackers around the world can decide to make an old story popular by viewing it enough - personally, or even programmatically. Google thinks the story is hot and fresh and publishes   it in an alert. Investment bankers pick it up and sell-sell-sell. If the hackers get greedy, they'll likely be found out in the subsequent investigation. Or not. Mad money, anyone?

TrackBack

TrackBack URL for this entry:
https://www.typepad.com/services/trackback/6a00d8341c84cf53ef010534982dc0970b

Listed below are links to weblogs that reference Google, dates, and UAL:

Comments

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.