« Data Matters! Map your Data to search engine Features (Part 1) | Main | Explicit Tagging on the net; Implicit tagging in companies »

October 19, 2007

Is Gartner missing a trend?

The new Gartner 'Magic Quadrant' report for Information Technology, released last month, shows few surprises in the actual vendor chart. But the report goes on to explain that, of the open-source search engines, "none of them are significant enough to threaten the commercial market". They go on to specifically mention Lucene, saying "enterprises don't consider it a significant alternative". We beg to differ.

Gartner does talk about IBM's strong support of Lucene; and they do say that, if IBM invests substantially in the technology, Lucene may reach its potential. However, we see a number of companies already placing their bets on Lucene - although here I am considering the Apache 'Lucene-Solr-Nutch' franchise as a single, related set of tools.  The list of Lucene users we know includes start-up vertical search companies that don't have much money; but we also see some well-funded and growing public companies which are choosing to build their skills in-house for total control over their own search destiny. Netflix, Monster.Com, and Pearson Scott Foresman are just a few of the companies that use Lucene-Solr-Nutch and are incredibly happy with their choice. And more are looking every day.

The open source path may not be right for every company. Lucene is a toolkit, and we tell our customers that "some assembly is required". It is still weak on filters for document formats, it offers weak stemmer support, and has no integrated support for document security. Lucene and Solr don't include a spider/crawler, although Nutch is always available for that. And while there are wrappers for other popular languages, you will probably want some developers who know Java pretty well. But once you have the right skills in-house, it provides pretty good search in a lightweight, portable application.

We agree with Gartner when they say support from a major vendor like IBM would be a major benefit to the Lucene franchise; but we don't think it's necessary. Think about this: Lucene included a parametric search capability months before the Google Search Appliance did. And the Lucene franchise features search term highlighting; completely tunable relevance and a transparent relevance algorithm; and the capability of fine tuning just about everything to work exactly as you want it. It may be a toolkit, but it is sure a pretty good one for many environments.

It's not like Gartner to miss the wave completely; maybe they are just not listening to the same people we've been talking to with in the corporate world.


TrackBack URL for this entry:

Listed below are links to weblogs that reference Is Gartner missing a trend?:


I agree that Gartner missed this one. Lucene and Solr are great. They have limitations, as you pointed out, but the core features, scaling, and community support are as good as any product and better than many.

Lucene made commercial search libraries uneconomical a few years ago, so it is actually a lower-risk option than a commercial library. The commercial libraries can't bring in enough revenue to keep up with Lucene. Compare the latest release notes from Lucene and another library. Look at the changes. Are they only bug fixes or are there big improvements?

Solr adds XML config files and a web API with clients in Java, C#, Python, and Ruby. Or you can write your own, like I did. It's just XML over HTTP.

Our back end needs to handle 7-10 million queries per day. A big load, but Solr can do it and deliver quality results.

The comments to this entry are closed.