« January 2011 | Main | March 2011 »

2 posts from February 2011

February 13, 2011

Humans versus Watson on Jeopardy Feb 14-16 2010

This week is a big one in search technology. Well, sort of - if you liked seeing IBM's 'Deep Blue' beat Garry Kasparov back in 1996.

For several years, a team at IBM has been working on a computer system - dubbed 'Watson' - that will be one of the featured players this week on the game show Jeopardy.

The IBM team has been working on the project for years. According to NOVA, Watson has passed the screening interview required of all players; and this week - Monday the 14th through Wednesday the 16th - Watson will take on the two best human players in Jeopardy history, Ken Jennings and Brat Rutter, in a historic match. The Nova special, 'The Smartest Machine on Earth', tells the story in a captivating way without too much waving of the hands. It takes us through the low points and the ultimate high point, when, in a test round a few months ago, Watson soundly defeated two human players.

Main_event Watson is not connected to the Internet, so it's on its own at air time. The system is not voice-driven, so for input it receives the question in the form of a text stream when the director clicks the magic button to flip the question. Watson can buzz in like the human players, and it speaks the 'question' in a synthesized human voice. Because it cannot listen to the other players' wrong answers, the IBM support engineers 'notify' Watson when there was a wrong answer so it can use that information in its determination.

Watching the practice round linked above is interesting: they've overlaid Watson's answers even when it did not buzz in first; and it is uncanny how often Watson was right - just too late to buzz in.

This doesn't apply to search engines just yet; Watson is programmed for the nuances of the game show and isn't billed as an AI device. Still, it's interesting to see the work the iBM team put into getting Watson ready; and we'll se how it does this week.

Man versus machine: sounds like something right out of the Firesign Theater's 'I think we're all Bozos on this bus' when 'Ah Clem' takes on the President and wins. Except that this time it might be a chance for revenge if Watson can pull it off: check it out this week, Monday through Wednesday!



February 02, 2011

Make your search engine seem psychic

People tell us that Google just seems to know what they want - it's almost psychic sometimes. If only every search engine could be like Google. Well, maybe it can.

Over the years, the functions performed by the actual 'search engine' have grown. At first, it was simply a search for an exact match - probably using punch card input. Then, over time, new and expanded capabilities were added, including stemming... synonyms... expanded query languages... weighting based on fields and metadata.. and more. But no matter what the search technology provided, really demanding search consumers pushed the technology, often by wrapping extra processing both at index time and at query time. This let the most innovative search driven organizations stay ahead of the competition. Two great examples today: LexisNexis and Factiva.

In fact, the magic that makes public Google search so good - and so much better than even the Google Search Appliance - is the armies of specialists analyzing query activity and adding specialized actions 'above' the search engine. 

One example of this many of us know well: enter a 12 digit number. if the format of the number matches the algorithm used by FedEx in creating tracking numbers, Google will offer to let you track that package directly from FedEx. For example, search for 796579057470 and you see a delivery record; change that last 1 to a zero, and you get no hits. How do they know?

The folks at Google must have noticed lots of 12 digit numbers as queries; and being smart, they realized that many were FedEx tracking numbers. I imagine, working in conjunction with FedEx, Google implemented the algorithm - what makes a valid FedEx tracking number - and boosted that as a 'best bet'.

Why is this important to you? Well, first it shows that Google.com is great in part because of the army of humans who review search activity, likely on a daily basis. Oh, sure, they have automated tools to help them out - with maybe 100 million queries every day, you'd need to automate too. They look for interesting trends and search behavior that lets them provide better answers.

Secondly, you can do the same sort of thing at your organization. Autonomy, Exalead, Microsoft, Lucene, and even the Google Search Appliance, can all be improved with some custom code after the user query but before the results show up. Did the user type what looks like a name? Check the employee directory and suggest a phone number or an email address. Is the query a product name? Suggest the product page. You can make your search psychic.

Finally, does the query return no hits? You can tell what form the user was on when the search was submitted - rather than a generic 'No Hits' page. Was the query more than a single term? Look for any of the words, rather than all; make a guess at what the user wanted, based on the search form, pervious searches, or whatever context you can find.

So how do you make your search engine seem psychic? Learn about query tuning and result list pre-processing; we've written a number of articles about query tuning in our newsletter alone.

But most importantly: mimic Google: work hard at it every day.