« February 2010 | Main | April 2010 »

8 posts from March 2010

March 31, 2010

Google Exploring how to Provide "Education Search"

Google is exploring "how can we support people who are looking for not just an answer in five minutes" organize the search results into concepts and ideas. Google mulls blend of education, search

March 30, 2010

Forecasting when Enterprise Search Products might become Abandoned

Microsoft recently announced it will no longer offer new versions of FAST for Linux/UNIX, after it releases a new version later this year. This wasn't a surprise, given Microsoft's focus on Windows server. The following article uses that example as one illustration in explaining why vendors sometime partially abandon their acquisitions. Forecasting Software Product Abandonment

March 29, 2010

Using Solr Search with RDBMS

The article discusses how Solr can be used as a quicker and easier way to mine data from a targeted search when using a large scale RDBMS system.  Using Solr Search with RDBMS

March 25, 2010

The "Sliced Raw Fish Shoes it Wishes" - the Google Green Onion thing!

Google Translate is one example of how Google is successfully using brute-force computing power on complex problems. Google's Computer Might Betters Translation Tool

March 24, 2010

Indexing Text and HTML with Solr

Lucid Imagination has a tutorial on how to use Tika, Solr Cell, and curl to index HTML and plain text files. Indexing web pages with Solr, like magic

March 23, 2010

Enterprise Search Packaging Trends

Lynda Moulton discusses whether the search industry is moving towards search engines being embedded in suites, and no longer available as separate products. Search Industry in 2010

Note: Miles Kehoe described the importance of packaging in 2009 Overview of the Enterprise Search Market

March 22, 2010

An Exploratory Search Demo based on top of a general Web Search Engine

Daniel Tunkelang discusses the benefits of exploratory search (combining querying and browsing strategies) as he describes Eric Iverson's Itty Bitty Search demo built upon Yahoo BOSS. Guest Demo: Eric Iverson’s Itty Bitty Search

March 09, 2010

Enterprise search engines: They're *not* all the same

We're in the process of doing a search engine evaluation for a large customer. That, by itself, isn't news: we do those quite a bit for companies large and small. No, what makes this project most interesting is that we are doing side-by-side comparisons of three leading search technologies using industry-standard data sets.

Our assumption going in was that, for out-of-box simple searches, all three engines would return pretty much of the same set of results: after all, if TF/IDF (term frequency/inverse document frequency) was at the core of these technologies, they should be getting roughly the same results sets. Much to our surprise, if we look at the top 10 search results from each engine for a simple search, we get only about 15% overlap.

Let me explain it this way: if we retrieve ten search results for a specific query from one search engine, only 3 of the twenty - 15% - results were found by either of the other engines. In a typical list of 10 results, only 3 show up in more than one engine. We were especially amazed because we are going out of our way to use default parameters as much as possible: no entity extraction, no search tuning, no special synonyms or thesaurus terms.

We're still too early in the process to understand what's behind this surprising situation: it's always possible the results are too tentative to make any judgments, or we could find an error in our methodology.  We're working on it, and we'll get back with any findings that we can share. If you have any explanations, leave a comment - we'd love to hear what you think.