August 25, 2014

Is Elasticsearch really enterprise search?

Not too long ago, Gartner released it's the 2014 Magic Quadrant which I’ve written about here and which has generated a lively discussion on the Enterprise Search Engine Professionals group over on LinkedIn.

Much of the discussions I’ve seen about this year's MQ deals with the omission of several platforms that most people think of as 'enterprise search’. Consider that MQ alumni Endeca, Exalead, Vivisimo, Microsoft FAST, and others don’t even appear this year. Over the last few years larger companies acquired most of these players, but in the MQ it's as if they simply ceased to exist.

The name I've heard mentioned besides these previous MQ alumni is Elasticsearch, a relatively new start-up. Elasticsearch, based on Apache Lucene, recently had a huge round of investment by some A-List VCs. What's the deal, Gartner?

Before I share my opinion, I have to reiterate that, until recently, I was an employee of Lucidworks, which many people see as a competitor to Elasticsearch. I believe my opinions are valid here, and I believe I’m known for being vendor-neutral. I think the best search platform for a given environment is a function of the platform and the environment – what data source, security, management and budget apply for any given company or department. “Search engine mismatch’ is a real problem and we’ve written about it for years.

Given that caveat, I believe I’m accurately describing the situation, and I encourage you to leave a comment if you think I've lost my objectivity!

OK, here goes. I don't believe Elasticsearch is in the enterprise search space. For that reason, if for no other, it doesn’t belong on the Gartner Magic Quadrant for search.

You heard it here. It's not that I don't think Elasticsearch isn’t a powerful, cool, and valuable tool. It is all that, and more. As I mentioned, it’s based on Apache Lucene, a fantastic embedded search tool. In fact, it's the same tool Solr (and therefore Lucidworks' commercial products) are based on.  But Lucene by itself is a tool more than a solution for enterprise search.

Let me start by addressing what I think Elasticsearch is great for: search-enabled data visualization. The first time I attended an Elasticsearch meet-up, they were showing the product in conjunction with two other open source projects: Logstash and Kibana. The total effect was great and made for a fantastic demo! I was fully and completely impressed, and saw the value immediately - search driving a visualization tool that was engaging, interactive, and exciting! 

Since then, Elasticsearch has apparently hired the guys who created those two respective open source projects, and has now morphed into a log analytics company - more like Splunk with great presentation capability, and less like traditional enterprise search. Their product is ELK - Elasticsearch Logstash Kibana. You can download all of these from GitHub, by the way.

(Lucidworks has also seen the value of Kibana to enterprise search, and has released their own version of Logstash and Kibana integrated with Solr called SiLK (Solr-Integrated Logstash and Kibana).

Now let me tell you why I do not think of Elasticsearch as an enterprise search solution. First, in my time at Lucid, I'm not aware of any enterprise opportunities that Lucidworks lost to ELK. I could be wrong, and maybe the Elastic guys know of many deals we never saw at Lucid. But with no crawler and other components I consider ‘required’ as part of an enterprise search product, I'm not sure they're interested - yet, at least.

Next, check the title of their home page: "Open Source Distributed Real Time Search". Doesn't scream 'Google Search Appliance replacement', does it? Read Elasticsearch founder Shay Banon on the GSA.

Finally, Wired Magazine has an even more interesting quote: Shay Banon on SharePoint. “We're not doing enterprise search in the traditional sense. We're not going to index SharePoint documents”.

Now, with the growth and the money Elasticsearch has, they may change their tune. But with over $100M in venture capital now, I think their investors are valuing Elasticsearch as a Splunk competitor, and perhaps a NoSQL search product for Hadoop - not a traditional enterprise search engine. 

So the real question is: which space are you in? Enterprise Search with SharePoint and other legacy data sources? Web content and file shares you need a crawler for? Is LDAP or Active Directory security important to you? Well - I won't say 'no way' - but I'd want to see it before I buy.

Do you use Elasticsearch for your enterprise search? Let me hear from you!

 

 

 

August 21, 2014

More on the Gartner MQ: Fact or fiction?

There is a lively discussion going on over in the LinkedIn ‘Enterprise Search Engine Professionals’ group about the recent Gartner Magic Quadrant report on Enterprise Search. Whit Andrews, a Gartner Research VP, has replied that the Gartner MQ is not a 'pay to play'. I confess guilt to have been the one who brought the topic up in these threads, at least, and I certainly thank Whit for clarifying the misunderstanding directly.

That said, two of my colleagues who are true search experts have raised some questions I thought should be addressed.

Charlie Hull of UK-based Flax says he's “unconvinced of the value of the MQ to anyone wanting a comprehensive … view of the options available in the search market'. And Otis Gospodnetić of New York-based Sematext asks "why (would) anyone bother with Gartner's reports. We all know they don't necessarily match the reality". I want to try to address those two very good points.

First, I'm not sure Gartner claims to be a comprehensive overview of the search market. Perhaps there are more thorough lists- my friends and colleagues Avi Rappoport and Steve Arnold both have more complete coverage. Avi, now at Search Technologies, still maintains   

www.searchtools.com with a list that is as much a history of search as a list of vendors. And Steve Arnold has a great deal of free content on his site as well as high quality technology overviews by subscription. Find links to both at arnoldit.com.

Nonetheless, Gartner does have published criterion, and being a paid subscriber is not one of them. His fellow Gartner analyst French Caldwell calls that out on his blog. By the way, I have first-hand experience that Gartner is willing to cut some slack to companies that don't quite meet all of their guidelines for inclusion, and I think that adds credence to the claim that everything.

A more interesting question is one that Otis raises: “why would anyone bother with Gartner's reports”?

To answer that, let me paraphrase a well-known quote from the early days of computers: "No one ever got fired for following Gartner's advice". They are well known for having good if not perfect advice - and I'd suspect that in the fine print, Gartner even acknowledges the fallibility of their recommendations. And all of us know that in real life, you can't select software as complex as an enterprise search platform without a proof of concept in your environment and on your content.

The industry is full of examples where the *best* technology loses pretty consistently to 'pretty good' stuff backed by a major firm/analyst/expert. Otis, I know you're an expert, and I'd take what you say as gospel. A VP at a big corporation who is not familiar with search (or his company's detailed search requirements) may not do so. And any one on that VP's staff who picks a platform based solely on what someone like you or I say probably faces some amount of career risk. That said, I think I speak for Otis and Charlie and others when I say I am glad that a number of folks have listened to our advice and are still fully employed!]

So - in summary, I think we're all right. Whit Andrews and Gartner provide advice that large organizations trust because of the overall methodology of their evaluation. Everyone does know it's not infallible, so a smart company will use the 'trust but verify' approach. And they continue to trust you and I, but more so when Gartner or Forrester or one of the large national consulting companies conforms our recommendation. And of not, we have to provide a compelling reason why something else is better for them. And the longer we're successful with out clients, the more credible we become.

 

 

August 05, 2014

The unspoken "search user contract"

Search usability is a major difference between search that works and search that sucks. If you want a free one-hour usability consultation, let me know.

I recently had lunch with my long time friend and associate Avi Rappoport from Search Technologies. We had a great time exchanging stories about some of the search problems our clients have. She mentioned one customer who she was explaining what best practices to follow when laying out a result list. That brought to mind what I've called the search user contract, which users tacitly expect when they use your search on all of your sites, internally and externally.

If you are responsible for an instance of search running inside a firewall, even if it's outward facing, you have a problem your predecessors of 15 to 20 years ago* didn't have. Back then, most users didn't have experience with search except the one you provided - so they didn't have expectations of what it could be like.

Fast forward to 2014. In addition to your intranet search, virtually everyone in your organization knows, uses, and often loves Amazon, Facebook, Google, Apple, eBay and others. They know what really great search looks like. They expect you to suggest searches (or even products) on the fly! Search today knows misspelled words and what other products you might like.

But most importantly, almost all of these sites follow the same unspoken user contract:

  • On the result list, the search box goes at the top, either across a wide swath of the browser window, or in a smaller box on the left hand side, near the top.
  • There is more than one search box on the results page.
  • Search results, numbered or not, show a page title and a meaningful summary of the document. Sometimes the summary is just a snippet. Words that cause the document to be returned are sometimes bolded in the summary.
  • Suggestions for the words and phrases you type show up just below the search box (or up in the URL field)
  • Facets, when available, go along the left hand side and/or across the top, just under the search box. Occasionally they can be on the right of the result list.
  • When facets are displayed own the left or right side of the screen, the numbers next to each facet indicate how many results show when you click that facet.
  • Best bets, boosted results, or promoted results show up at the top of the result list.
  • Advertisements or special announcements appear on the right side of the result list.
  • Links to the 'next’ or ‘previous' results page appear at the bottom and possibly at the top of the results.

Now it's time to look your web sites - public facing as well as behind your firewall. Things we often see include:

  • Spelling suggestions in small, dark font very close to the site background color, at the left edge of the content, just above facets. Users don't expect to look there for suggestions, and even if they do look, make the color stand out so users see it** [Don't make the user think]
  • An extra search form on the page; one at the top as 'part of our standard header block'; and one right above the result list to enable drill down. The results you see will different depending on which field to type in. [The visitor is confused: which search button should be pressed to do a 'drill down' search. Again, don't make the user think]
  • Tabs for drilling into different content areas seem to be facets; but some of the tabs ('News") have no results. [Facets should only display if, by clicking on a facet, the user can see more content]
  • As I said at the top, we’ve found poor search user experience is a major reason employees and site visitors report that ‘search sucks’. One of the standard engagements we do is a Search Audit, which includes search usability in addition to a review of user requirement and expectations.  If you want a free consult on your usability, let me know.

 

/s/Miles

 

*Yes, Virginia, there was enterprise search 20 or more years ago. Virtually none of those names still exist, but their technology is still touching you every day. Fulcrum, Verity, Excalibur and others were solving problems for corporations and government agencies; and of course Yahoo was founded in 1994.

**True story, with names omitted to protect the innocent. On a site where I was asked to deliver a search quality audit, ‘spelling suggestions’ was a top requested feature. They actually had spell suggestions, in grey letters in a dark black field with a dark green background, far to the left of the browser window. No one noticed them. You know you are; you’re welcome!

 

August 04, 2014

Explaining the 2014 Gartner Enterprise Search MQ

The recent release of the Gartner 2014 MQ for Enterprise Search held a number of surprises, some of which I mentioned in my original post last week. My initial reaction was that at least a couple of the companies new to the MQ this year don't really strike me as 'enterprise search' companies. But as I dig into the MQ text, I can see some logic to their call. I can also see something else in the 2010 Enterprise search MQ: Search is going into a somewhat boring mid-life crisis.

Just a handful of years ago the field was vibrant. We had leaders FAST, Exalead, Endeca, ISYS, Open text, Omniture and more, none of which have survived to the 2014 MQ. Some were acquired; some faded away into different industries. But of the big names of the recent past, only Coveo, Autonomy and Google survive as 'Leaders'. Mark Logic, Lucidworks, Attivio, IBM and Mark Logic fall into the lower half of the 'Ability to Execute' axis.

What's going on here? "Big data" is the new kid on the block, a new toy to play with, and search is struggling with an uphill battle with the Apache Zoo-related tools (as well as new ones that seem to be announced daily.) Some companies, notably Lucidworks, are doing quite a bit to optimize search of content stored in Hadoop repositories with the natural language interface you expect from web search engines. They also have focused on tools to speed indexing currently in their Lucidworks Search commercial product but which will likely find it way into open source. 

My take? Don't schedule the wake for enterprise search just yet. Its newer, younger and fresher cousins in big data are getting all the hype for now; but eventually, users have to be able to find content in a way they understand - not SQL, but natural language queries typed by mere mortals on the 9 to 5 shift.

Need advice on your search solution? I'm happy to provide a one-hour consult on your enterprise search questions. Let me know.

July 29, 2014

Big data: Salvation for enterprise search?

Or just another data source?

With all the acquisitions we've seen in enterprise search in the last several years, it's no wonder that the field looks boring to the casual observer. Most companies have gone through two or more complex, costly search implementations to a new search platform, users still complain, and in some quarters, there seems to be 'quality of search fatigue'. I acknowledge I'm biased, but I think enterprise search implemented and managed properly provides incredible value to corporations, their employees, and their customers/consumers. That said, a lot of companies seem to treat search as 'fire and forget', and after it's installed, it never gets the resources to get off the ground in a quality way.

It's no surprise then that the recent hype bubble in 'Big Data' has the attention of enterprise search companies as they see a way to convince an entirely new group of technologists that search is the way. 

It's certainly true that Hadoop's beginning was related to search - as a repository for web crawler Nutch in preparation for highly scalable indexing in Lucene/Solr no doubt. Hadoop and its zoo* of related tools certainly are designed for nerds. At best, it's a framework that sits on top of your physical disks; worst case it's a file system that does support authentication but not really security (in the LDAP/AFD sense). And it's a great tool to write 'jobs' to manipulate content in interesting ways to a data scientist. How is your Java? Python? Clojure? Better brush up.

The enterprise search vendors of the world certainly see the tremendous interest in Hadoop and 'big data' as a great opportunity to grow their business. And for the right use cases, most enterprise search platforms can address the problem. But remember that, to enterprise search, the content you store in Hadoop is simply content in a different repository: a new data source on the menu.

But remember, big data apps come with all the same challenges enterprise search has faced for years plus a few more. Users - even if data scientists and researchers - think web-based Google is search; and even though - as a group - this demographic may be more intelligent than your average search users, they still expect your search to 'just know". If you think babysitting your existing enterprise search solution is touch, wait until you see what billions of documents does for you.

And speaking of billions of records - how long does your current search take to index your current content? How long does it take to do a full index of your current content? Now extrapolate: how long will it take to index a few billion records? (Note: some vendors can provide a much faster 'pipe' for indexing massive content from Hadoop. Lucidworks and Cloudera are two of the companies I am familiar with; there may be others)

A failure in search? Well, it depends what you want. If you are going to treat Hadoop as a 'Ha-Dump' with all of your log files, all of your customer transactional data, hundreds of Twitter feeds for ever and ever, and add your existing enterprise data, you're going to have some time on your hands while the data gets indexed.

On the other hand, if you're smart about where your data goes, break it into 'data lakes' of related content, and use the right tool for each type of data, you won't be using your enterprise search platform for use cases better served with analytics tools that are part of the Apache Zoo; and you’ll still be doing pretty well. And in that universe, Hadoop is just another data source for search - and not the slow pipe through which all of your data has to flow.

Do you agree?

 

*If you get the joke, chances are you know a bit about the Apache project and open source software. If not, you may want to hold off and research before you download your first Hadoop VM.