May 14, 2013

Open Source Search Myth 5 - Total Cost of Ownership

This is part of a series addressing the misconception that open sounce search is too risky for companies to use. You can find the introduction to the series here; and Part 4, Features and Capabilities, here.

Part 5: Total Cost of Ownership

Total cost of ownership, TCO, is a big deal to large users of search technology. Usually, the component of TCO with respect to search is the license fee; enterprise search was historically an expensive proposition. But in fact there are other major components of TCO including implementation/operations, hardware cost, and ongoing support come to mind.

Walter Underwood, one of the key developers at Ultraseek and later the guy who did the Netflix relevancy contest, once explained the difference between commercial and open source search. Let me paraphrase: 

"With commercial search, you spend a lot of money to license it; then you spend a lot of money to implement it.

With open source search, you download the software for free; then you spend alot of money implementing it."

But there is another big element: how much iron do you need? A few years ago we helped a company switch search platform. Their business was search enabling small-town newspaper archives going back to the 1890s, via OCR'd content. They add tens of thousands of documents - historical newspaper articles - every day. 

The commercial platform they replaced required major expense in new servers as they content grew. Every year.

As it turns out, the ROI for swapping out their old search engine was easy: they needed less new hardware every year than with the old engine. And so much less that the ROI period was less than a year.

A different project we did when we were still doing business as New Idea Engineering involved a comparison between Microsoft SharePoint 2010 and search with Solr. Our customer wanted to know if the switch would, indeed, require fewer servers to do the job. It turns out that it was quite reasonable to replace the 12 servers Microsoft FAST required with 6 or fewer servers running Solr. Half the cost of servers; half the cost of energy; half the cost of maintenance. Like the concept?

Now, I'll agree that LucidWorks - my employer - markets a proprietary search platform based on Solr. And we do not license the product for free. But compared to most commercial platforms, LucidWorks Search is pretty darned reasonable. And you still get the cost savings in energy, iron, and scalability.

Less hardware. Better search. How is the TCO of open source a liability compared to most commercial search platforms?

 

 

April 23, 2013

Open Source Search Myth 4 - Features and Capabilities Lag

This is part of a series addressing the misconception that open source search is too risky for companies to use. You can find the introduction to the series here; and Part 3, Skills required In-House, here.

Part 4: Features and Capabilities Lag

Keeping up with the latest and greatest technology is important, especially when there is a great deal of innovation in a field. Enterprise search is one such field.

In this post I'll address the claim that "Production functionality may trail in specific features relative to commercial search firms".

First, let me remind you that many of the coolest advanced capabilities in modern search platforms is delivered using third party products integrated into the actual search product. Examples:

Entity extraction: Cool stuff, and part of many search platforms. Often implemented using technology from companies like Basis Technology, Pingar, and others.

Non-English support: Required for any large-scale enterprise. Think Basis Technology again; or pretty darned good open source filters.

Document format support: Leaders here were smaller companies that were eventually purchased by larger search companies: Keyview (not Autonomy); Stellent (now Oracle); ISYS (now IBM). Open source Tika.

Sentiment Analysis: Identify 'positive' versus 'negative' sentiment, using products from Lexalytics, Attensity, SAS, LingPipe and others. 

My point is not that large enterprise search platform companies do not include some cool new technologies in their products: it's just that the 'cool' usually comes from a third party that can be licensed for use in any platform, not just "commercial" ones. 

And, when you use open source platforms, you always have the option of doing a feature yourself - either in-house, or using a consulting firm.

And you might not be aware of capabilities where open source Solr is ahead of many commercial vendors. For example, consider Geo search, which lets you easily search for 'documents' relevant to a particular location.  And it can even be used to answer questions like "what managers are on-duty on Saturday night at the LA store".

I will say that Microsoft, in its SharePoint 2013, has implemented a very nice query boosting tool that, as far as I can tell, was created in-house - I doubt it was in the FAST pipeline at the acquisition. 

But give that caveat, I'd ask, what with all of recent acquisition and mergers, whether any 'enterprise search' company implemented major new capability like pivot facets, entity extraction and more - without licensing the technology from an outside company?

 

 

 

 

 

March 20, 2013

Open Source Search Myth 3: Skills Required In-House

This is part of a series addressing the misconception that open source search is too risky for companies to use. You can find the introduction to the series here; this is Part 3 of the series; for Part 2 click Potentially Expensive Customizations.

Part 3: Skills Required In-House

One of the hallmarks of enterprise software in general is that it is complex. People in large organizations who manage instances of enterprise search as no less likely than their non-technical peers to believe that "if Google can make search so good on the internet, enterprise search must be trivial". Sadly, that is the killer myth of search.

Google on the internet - or Bing or Baidu or whichever site you use and love - is good because of the supporting technology, NOT simply because of search. I'd wager that most of what people like about Google et al has very little to do with search and a great deal to do constant monitoring and tweaking of the platform.

Consider: at the Google 'command line' (the search box), you can type in an arithmetic equation such as "2+3" get 5. You can enter a FedEx tracking number and get a suggestion to link to FedEx for information. It's cool that Google provides those capabilities and others; but those features are there because Google has programs looking at search behavior for all of its users every day in order to understand user intent. When something unusual comes up, humans get involved and make judgments. When it makes sense, Google implements another capability - in front of the search engine, not within it.

Enterprise search is the same - except that very few companies invest money in managing and running their search; so no matter how well you tune it at the beginning, quality deteriorates over time. Enterprise search is not 'fire and forget'.

 Any company that rolls out a mission critical application and does NOT have their own skilled team in house is going to pay a consulting form thousands of dollars a day forever. 'Nuff said.

 

March 18, 2013

Solr 4 Training 3/27 in Northern Virginia/DC area

Interrupting my series on whether open source search is a good idea in the enterprise to tell you about an opportunity to attend LucidWorks' Solr Bootcamp in Reston, Virginia on Wednesday March 27. Lucid staff and Lucene/Solr committers Erick Erickson and Erik Hatcher will be there, along with Solr pro Joel Bernstein. Heck, I'll even be there!

The link is here; for readers of our blog, use discount code SOLR4VA-5OFF for a discount.

Course Outline:

  • What's new in Solr 4
  • Solr 4 Functional Overview
  • Solr Cloud Deep Dive
  • Solr 4 Expert Panel Case Studies
  • Workshop and Open lab

And ask the guys how you can get involved in Solr as a contributor or committer!

 

March 15, 2013

Open Source Search Myth 2: Potentially Expensive Customizations

This is part of a series addressing the misconception that open source search is too risky for companies to use. You can find the introduction to the series here; this is Part 2 of the series; for Part 3 click Skills Required In House.

Part 2: Potentially Expensive Customization

Which is more expensive: open source or proprietary search platforms?

Commercial enterprise search vendors often quote man-years of effort to create and deploy what, in many cases, should be relatively straightforward site search.  Sure, there are tough issues: unusual security; the need to mark-up content as part of indexing; multi-language issues; and vaguely defined user requirements.

Not to single them out, but Autonomy implementations were legend for taking years. Granted, this was usually eDiscovery search, so the sponsor - often a Chief Risk Officer - had no worries about budget. Anything that would keep the CRO and his/her fellow executives out of jail was reasonable. But even with easier tasks such as search-enabling an intranet site, took more time and effort than it needed because no one scoped out the work. This is one reason so many IDOL projects hire large numbers of IDOL contractors for such long projects.

FAST was also famous for lengthy engagements. 

FAST once quoted a company we later worked with a one year $500K project to assist in moving from ESP Version 4.x to ESP Version 5.x. These were two versions that were, for all purposes, the same user interface, the same API, the same command line tools. Really? One year?

True story: I joked with one of the sales guy that FAST even wanted 6 months to roll out a web search for a small intranet; I thought two weeks was more like it. He put me on the spot a year later and challenged me to help one of his customers, and sure enough, we took almost a month to bring up search! But we had a constraint: the new FAST search had to be callable from the existing custom CMS, which had hard-coded calls to Verity K2 - the customer did not have time to re-write the CMS.

Thus, part of our SOW was to write a front-end that would accept search requests using the Verity K2 DLL; intercept the call; and perform the search in FAST ESP. Then, intercepting the K2 results list processing calls, deliver the FAST results to the CMS that thought it was talking with Verity. And we did it in less that 20% of the time FAST wanted to index a generic HTML-bases web site.

On the other hand, at LucidWorks we frequently have 5-day engagements to set up the Solr and LucidWorks Search; index the user's content; and integrate results in the end user application. I think for most engagements, other Solr and open source implementations are comparable. 

Let me ask: which was the more "expensive" implementation?