33 posts categorized "Open Source"

March 15, 2013

Open Source Search Myth 2: Potentially Expensive Customizations

This is part of a series addressing the misconception that open source search is too risky for companies to use. You can find the introduction to the series here; this is Part 2 of the series; for Part 3 click Skills Required In House.

Part 2: Potentially Expensive Customization

Which is more expensive: open source or proprietary search platforms?

Commercial enterprise search vendors often quote man-years of effort to create and deploy what, in many cases, should be relatively straightforward site search.  Sure, there are tough issues: unusual security; the need to mark-up content as part of indexing; multi-language issues; and vaguely defined user requirements.

Not to single them out, but Autonomy implementations were legend for taking years. Granted, this was usually eDiscovery search, so the sponsor - often a Chief Risk Officer - had no worries about budget. Anything that would keep the CRO and his/her fellow executives out of jail was reasonable. But even with easier tasks such as search-enabling an intranet site, took more time and effort than it needed because no one scoped out the work. This is one reason so many IDOL projects hire large numbers of IDOL contractors for such long projects.

FAST was also famous for lengthy engagements. 

FAST once quoted a company we later worked with a one year $500K project to assist in moving from ESP Version 4.x to ESP Version 5.x. These were two versions that were, for all purposes, the same user interface, the same API, the same command line tools. Really? One year?

True story: I joked with one of the sales guy that FAST even wanted 6 months to roll out a web search for a small intranet; I thought two weeks was more like it. He put me on the spot a year later and challenged me to help one of his customers, and sure enough, we took almost a month to bring up search! But we had a constraint: the new FAST search had to be callable from the existing custom CMS, which had hard-coded calls to Verity K2 - the customer did not have time to re-write the CMS.

Thus, part of our SOW was to write a front-end that would accept search requests using the Verity K2 DLL; intercept the call; and perform the search in FAST ESP. Then, intercepting the K2 results list processing calls, deliver the FAST results to the CMS that thought it was talking with Verity. And we did it in less that 20% of the time FAST wanted to index a generic HTML-bases web site.

On the other hand, at LucidWorks we frequently have 5-day engagements to set up the Solr and LucidWorks Search; index the user's content; and integrate results in the end user application. I think for most engagements, other Solr and open source implementations are comparable. 

Let me ask: which was the more "expensive" implementation?

March 13, 2013

Open Source Search Myth 1 - Enhancements by Committee

This is part of a series addressing the misconception that open source search is too risky for companies to use. You can find the introduction to the series here; this is Part 1 of the series; for Part 2, click Potentially Expensive Customizations.

Part 1: Enhancements Subject to Committee

The original article back on LinkedIn called out one 'flaw' of open source search the belief that updates and improvements were made only on a timetable selected by the community - presumably the committers.  

One of the hallmarks of Apache open code projects is that, when you make a change or make an enhancement to the code, you submit the changes back to the Apache project. 

My employer, LucidWorks, enhances Solr, and we push back changes we make for consideration of the entire Solr community. Many of these changes are accepted and become part of Solr - almost all because of demand/need we see in our commercial customers, which helps everyone. 

Occasionally, a customer has a specific need and asks us to develop capabilities that are not part of the standard release. Sometimes the enhancements are on the Apache project plan; and sometimes they are unique. In any case, we create the enhancement and submit them for consideration in the standard Solr trunk. Once we’ve done so, anyone can download our enhancements and use them. And, as we do at LucidWorks, anyone can write enhancements for themselves and make them available.

Compare this to commercial search vendors who update on their own (typically unpublished) schedule; and no one can add a feature on their own. The vendor decides, and the consumer can only hope. And you pay upward of 20% of the list price every year in anticipation of the change you hope for but cannot add on your own.

And no matter what happens to Solr, our customers have the source code to self-support forever - no involuntary forced conversion. 

March 11, 2013

Open source search engine - a good idea?

My transition to LucidWorks has been a busy one with little time for other interests and hobbies (like flying!). But a week or so back I spotted a November 2012 post on the LinkedIn Enterprise Search Engine Professionals group asking the question in this post's title. 

Of course, LucidWorks provides deep support for Solr, and markets a Solr-based enterprise search product; but it's my work with enterprise search technology for the last 20+ years that really drives my response. Sadly, my reply was longer than LinkedIn allows.. so I posted a shorter link there and have come back here to reply in full. It's going to take a few posts though, so bear with me if you will. 

First, my response to the poster: Eight years ago open source was cool, but was probably not 'enterprise ready'.  Enterprise search is hard, but years ago the Apache projects (Lucene and Solr) began working to solve the tough issues - ones that were not commercially worth it for the 8 to 10 major commercial enterprise search companies.

Then a funny thing happened: Solr got better and better; and the commercial vendors started merging. Verity got sucked into Autonomy, which got sucked into HP. FAST got sucked into Microsoft. Vivisimo got sucked into IBM. And with every acquisition, the time and money that enterprises had invested in commercial search became totally wasted - when the platform you based your search on got acquired, you had to move to the new engine. A painful, expensive and long process,

As I blogged just a few weeks ago, open source search is now the default SAFE choice for enterprises that need search. You may have to do some coding, or find a skilled expert/team to help; but you own your destiny. Lucid (my company) does sell support for Solr; there are other fine companies, large and small, that do so. We're fortunate enough to employ a good number of the committers - no majority, which is probably best for the community. 

The original poster, an employee of a proprietary search vendor, may have had his reasons. Nonetheless, he listed five reasons he felt that open source search for enterprises was a bad idea - based on a three year old report by my friend Hadley Reynolds - taken a bit out of context. These 'disadvantages' are listed and linked below.  

* Enhancements on community timetable only

* Potentially expensive customization

* Requirement for search development skills in-house or ready-to-hand

* Production functionality may trail in specific features relative to commercial search firms

* Maintenance/system life costs can become significant

In the next several posts, I'm going to address and refute these one at a time. Stay tuned.


February 14, 2013

A paradigm shift in enterprise search

I've been involved in enterprise search since before the 'earthquake World Series' between the Giants and the A's in 1989. While our former company became part of LucidWorks last December, we still keep abreast of the market. But being a LucidWorks employee has brought me to a new realization: commercial enterprise search is pretty much dead.

Think back a few years: FAST ESP, Autonomy IDOL (including the then-recently acquired Verity), Exalead, and Endeca were the market. Now, every one of those companies has become part of a larger business. Some of the FAST technology lives on, buried in SharePoint 2013; Autonomy has suffered as part of HP because - well, because HP isn't what it was when Bill and Dave ran it. Current management doesn't know what they have in IDOL, and the awful deal they cut was probably based on optimistic sales numbers that may or may not have existed. Exalead, the engine I hoped would take the place of FAST ESP in the search market is now part of Dassault and is rarely heard of in search. And Endeca, the gem of a search platform optimized for the lucrative eCommerce market, has become one of three or four search-related companies in the Oracle stable. 

Microsoft is finally taking advantage of the technology acquired in the FAST acquisition for SharePoint 2013, but as long as it's tied to SharePoint - even with the ability to index external content - it's not going to be an enterprise-wide distribution - or a 'big data' solution. SharePoint Hadoop? Aslongf as you bring SQL Server. Mahout? Pig? I don't think so. There are too many companies that want or need Linux for their servers rather than Windows.

Then there is Google, the ultimate closed-box solution. As long as you use the Google search button/icon, users are happy – at least at first. If you have sixty guys named Sarah? Maybe not.

So what do we have? A few good options generally from small companies that tend to focus on hosted eCommerce - SLI Systems and Dieselpoint; and there’s Coveo, a strong Windows platform offering.

Solr is the enterprise search market now. My employer, LucidWorks, was the first, and remains the primary commercial driver to the open source Apache project. What's interesting is the number of commercial products based on Solr and it's underlying platform, Lucene.

Years ago, commercial search software was the 'safe choice'. Now I think things have changed: open source search is the safe choice for companies where search is mission. Do you agree?

I'll be writing more about why I believe this to be the case over the coming weeks and months: stay tuned.



December 18, 2012

Last call for submiting papers to ESS NY

This Friday, December 21, is the last day for submitting papers and workshops to ESS in NY in May 21-22. See the information site at the Enterprise Search Summit Call for Speakers page.

If you work with enterprise search technologies (or supporting technologies), chances are the things you've learned would be valuable to other folks. If you have an in-depth topic, write it up as a 3 hour workshop; if you have a success story, or lessons learned you can share, submit a talk for a 30-45 minute session.

I have to say, this conference has enjoyed a multi-year run in terms of quality of talks and excellent Spring weather.. see you in May?



December 03, 2012

Why LucidWorks? And Why Now?

Big news for us here at New Idea Engineering. After 16 years as an independent search technology consulting company, we've become part of LucidWorks effective December 1, 2012.

For years we've focused on both the business and technology of search.  We've provided vendor neutral consulting services to large and small organizations. We've worked with search platform companies to help tune their product capabilities and their message; we've helped companies implement enterprise search from 'the usual (vendor) suspects'. We've provided business best practices, data audits, and implementation overview for dozens of companies for most of our time as an independent company.

As you know, the market for enterprise search has changed over the last several years. Verity then Autonomy, FAST, Endeca, Exalead, ISYS, and more have been acquired by large companies with varying levels of success. With these acquisitions the products have morphed to fit into the new owners' world view, we've politely referred to shift in focus as being "distracted". Google, one of the few non-acquired engines, got into the market with a low-cost entry which has enjoyed great acceptance; but as the market changed, Google has started raising its prices for the nifty yellow box.   And while they pursue laudable offerings like phones, tables, Google Glass and self driving cars, cloud computer and simultaneously retool their ad model for the mobile world, it's fair to say that even their enterprise offerings are potentially distracted at times.

Sure, if your company has a typical use case for search, there's an engine or appliance for you.  But so many complex projects we've seen are atypical, almost by definition.  These high-end projects are no longer efficiently served by commercial sector.  Many projects have turned to Open Source offerings, but not out of cost savings as you might think, but out of a desire to have extreme control and flexibility, and not be tied down by vendor meddling and license nit-picking.

Over the same period, more and more people have realized that the need to understand and manage 'big data' is taking off. In fact, search is the interface of choice to find content in big data repositories. 

It's been about 10 years since we did our first project based on Lucene, the basis for nearly all modern open source search engines today. Since then, the capabilities of open source search have increased to the point where we honestly think Solr may be the best search platform available on the market today. 

We didn't call what we did with Lucene back then 'big data', but that's really what it was. Scalable, controllable, flexible, powerful... and open! And free for the taking - and modifying. Just add programmers.

A few years back, Lucid Imagination was started to provide that support, along with training and an easy to use interface that lets business owners - not just developers - use Solr search.  We've called them "the RedHat of Open Source Search".  Now, Lucid Imagination has become LucidWorks, and it is set to be the best way to search web, file, and database content, with extreme control, and of course with big data.

A few months ago we spoke with Lucid CEO Paul Doscher about upping our contract with them, and about where they were going, and it just made sense to us at that time to join a bigger team.

While we're committed to success at LucidWorks, we'll continue to use our blog to discuss all aspects of enterprise search – vendors, tools, technologies, events, and trends.  Unlike our days at past search companies, this one is based on an open platform so we'll be able to share a lot more as we move forward.

We hope you'll find our posts interesting, helpful, and engaging. Let us know how we're doing.


October 30, 2012

Link to cool story of Lucene/Solr 4.0's new fast Fuzzy Search

Interesting article with lots of links to other good resources.  Tells the story of a lot of open source cross pollination and collaberation, automatons, Levenstein, and even a dash of Python - thanks Mike!


July 06, 2012

Search appliance 'blues'

Over the US Independence day holiday many of us learned that Google is dropping its entry-level search GsaBlue box, the Google Mini. This comes as part of 'summer cleaning', the Mini being dropped with a number of other services and products that are just not hot enough to support the effort. (The one I'll really miss? iGoogle.) Google hasn't provided much information on how successful the GSA 'Blue' has been, but with a price point between $3K US and $10K US I imagine they moved a bunch of them to customers with simple search requirements. 

I think it may have Steve Arnold who said recently that the Google pubic web search and its advertising sales accounting for something like 96% of the company's revenue, so I don't think too many Googlers are upset about losing a small slice of a small slice of revenue. Heck, Mini proficts probably don't even pay the fuel bills for a weekend flight to Europe for the Google 767.

The impact? Well, back then the Mini was new and it was big news. Heck, the bigger  models were even better at not too much more money. Still, enterprise search was an expensive proposition then. Lucene was pretty new and quite rough around the edges; FAST, Exalead and Endeca were selling for upwards of $250K, and needed at least that amount of money to actually get them to work. Google Site Search was there; but not many other enterprise search products were around for that price.

A funny thing happened in the new century. Now enterprise customers are more demanding about search. The GSA - even the larger models - is generally well-received at first. At least as long as the 'Powered by Google' icon is visible. We had one customer tell us that just licensing the Google icon would solve most of his user complaints. And Verity's Andy Feit proved it statistically a year or two later. (Have a look at our post last year 'It's not Google unless it says it's Google'.)

But over time, even when content and user query activity remains about the same, people become increasingly frustrated using the GSA. But will Google abandon the color yellow too? Steve Arnold has wondered on LinkedIn whether the larger Google appliances are going to see the same fate soon. 

The problem isn't that it's an appliance. It's the closed system that people are turning away from. In the enterprise, you can't use the cool techniques that Google uses to generate psychic results on the internet. In the enterprise, managers know what content to boost; Metadata? Fielded search? Boost based on content? Not in the blue (or yellow) world. 

Still, I think Google and the GSA provide pretty darned good value for a certain part of the market. If your data is pretty decent; if you're serving highly interliked web and PDF content; if your data needs are not too demanding - GSA may be the solution you want. But before you spend money blindly, do what you do with any product you buy - verify it works in your environment. And as with any enterprise search platform, allocate a budget to run it properly after roll-out.

Yes, search has changed. Really good low-cost options are available. Where? Well, in addition to Google's site search offerings, there's Lucid Imagination's cloud and on-premises solutions; and some other darned good offerings based on open source: Flax - SearchBlox - and more.

What do you think? Is the loss of the Mini giving you the blues? 


(With thanks to Karan!)

May 10, 2012

Lucene Revolution: MS talks of being more open

Lucene Revolution: MS talks of being more open

At yesterday’s kickoff of Lucene Revolution 2012, Lucid CEO Paul Doscher introduced Gianugo Rabellino, Microsoft's Director of Open Source Communities. Gianugo said little about search per se, but he did confess to having been a fan of Lucene and Solr for a while now. In his talk, he told the audience that Microsoft has changed with respect to open source, and he went on to tell everyone how they have become more involved in open standard like HTML5, CSS3; and in hardware specifications like USB. He went so far as to say 'Microsoft's survival depends on open source software'.

News to me, and perhaps to others in the room, was the extent to which Microsoft is supporting a number of open source products and languages. Gianugo reported that Linux is now a 'first-class guest operating system' on Microsoft HyperV; and provides support for PHP, Ruby on Rails, node.js and other projects on Azure (and presumably for 'on premises' systems).

A number of folks from large commercial organizations seemed to appreciate the news about Microsoft's shift towards supporting open source; but a number of the open-source folks in the room felt this offered little new, and some even felt it was an unrelated 'sales pitch'. Even though we are Microsoft partners, I'm glad to see more support for open source products like PHP and Linux.

The finniest part of the talk came as Gianugo was describing how SharePoint data was easily accessible to other non-Microsoft' search platforms. An attendee asked if he felt there was a role for other platforms to be used as the primary engine for search in SharePoint; as he paused to craft a reply, Paul Doscher (loudly) pronounced his belief that there was, much to the pleasure of the crown.

There was not much else in the way of Microsoft news; but it was interesting to see how many people and how much effort Microsoft is putting into open source projects.



April 30, 2012

Is Microsoft joining the Lucene/Solr dance?

Lucene Revolution is only 10 days away, and if you're not already planning on being in Boston, today's a great time to register.

Why be at the 3rd annual Lucene Revolution, Lucid Imagination's open source conference? Several reasons:

  • Open source search is hot, and Lucene/Solr is better than ever;
  • Lucid Imagination is just introducing their LucidWorks Enterprise 2.1 release;
  • Paul Doscher, recently of Exalead, is the new CEO and keynote speaker; and
  • Microsoft's Gianugo Rabellino is speaking about Lucene, Azure, and OSS.

Yes, you saw it here. A Microsoft Azure guy is speaking right after Paul Dorscher Wednesday moring at Lucene Revolution. Has Microsoft caught the drift of the market towards Lucene/Solr in search, big data, and the cloud? Even search pundit Steven Arnold posted a few days back about Microsoft and Linux. Strange bedfellows perhaps, but there it is. 

So yes, I think if you can find any way to get to Boston in a week, I'd say do it. See you there!