November 05, 2014

Search Owner's Dilemma

In my session today at Enterprise Search & Discovery, I finished up with a rendition of the old rhyme called "The Engineer's Dilemma", updated for the folks who manage enterprise search in large organizations. Folks seemed to like it, so I'll share it here for those who were unable to be at the conference. I call it "The Search Manger's Dilemma"

It's not my job to pick our search
The call's not up to me.
It's not my place to say how much
The cost of search should be.
It's not my place to tune the thing, not even do it well,
But let the damn thing miss a page And see who catches hell!


September 18, 2014

Lucidworks ships Fusion 1.0 - Pretty exciting next gen platform.

OK, I've known about this coming for a while, just didn't know when until this afternoon - so I stayed up late to get the download started after midnight.

Fusion is more than an updated release of Lucidworks Search. It is Solr based, but it's a re-write from top to bottom. And it's not a bare bones search API only a developer can love. Connectors? Check. Security? Check. Analytics? Check. Entity extraction? Check. All included. 

But what it adds is where the real capabilities and contributions are. Machine learning? Check. Admin console? Check. Machine learning? Check. Log analytics? Check. A document pre-processing pipeline? Check. Deep signal processing (think 'automated context processing')? Check. 

Even if you think these new unique capabilities are not your style, then you can buy Solr support and still get licenses for connectors, entity extraction, and a handful of other formerly 'premium' products. Want it all? License the full product at a per-node price I always thought was underpriced. I'm sure you'll be hearing alot more in the coming days and weeks, but go - download - try - and see what it does for your sites. Your developers will love it, your business owners will love it, your users will love it, and I bet even your CFO will love it.  

Full disclosure: I am a former employee of Lucidworks; but I'd be just as excited even if I were not. Go download it for sure and try it on your content. But be sure to check out the  'search as killer app' video on Lucid's home page

s/ Miles



September 09, 2014

Sometimes you're just wrong! (Maybe).

OK, this one falls into the 'eat your own words' category, so I have to come clean. Well, partly clean. Let me explain.

I was out of town last week, but just before I left I wrote an article asserting that Elasticsearch really isn't 'enterprise' search. The article drew alot of attention and comments from both sides of the argument. I have to say I still think that's the case, but an announcement by Microsoft seems to differ, and end up a net positive for Elasticsearch. Microsoft tells us that Elasticsearch is the platform under the covers of Microsoft's Azure search offering. It looks like you have a couple of options - as long as you're on Azure:

a) You can download and use the open source Elasticsearch platform available on GitHub; or

b) Use Microsoft's managed service 'Facetflow Elasticsearch' which incorporates (some of) the open source code in various places.

Microsoft calls this "a fully-managed real-time search and analytics service" while, according to ZDNet, it is for 'web and mobile application developers looking to incorporate full-text search into their applications'. 

Either way, it's certainly yet another step forward for Elasticsearch, and is a big step forward in visibility for the company. It's not clear what kind of revenue they will receive from the deal - Microsoft being relatively famous for being quite frugal. And after all, smart search folks like Kevin Green of Spantree Technology Group talk about its strengths and liabilities, saying it *is* fast ('wicked fast'); fault-tolerant; distributed and more. But it is not a crawler; a machine learner; a user-facing front end, and it is not secure. 

So I'll agree a partial 'mea culpa' is in order; adding capabilities to an open source project can make it more enterprise ready. But I think the jury may still be out on the rest of my piece. Stay tuned!

August 25, 2014

Is Elasticsearch really enterprise search?

Not too long ago, Gartner released it's the 2014 Magic Quadrant which I’ve written about here and which has generated a lively discussion on the Enterprise Search Engine Professionals group over on LinkedIn.

Much of the discussions I’ve seen about this year's MQ deals with the omission of several platforms that most people think of as 'enterprise search’. Consider that MQ alumni Endeca, Exalead, Vivisimo, Microsoft FAST, and others don’t even appear this year. Over the last few years larger companies acquired most of these players, but in the MQ it's as if they simply ceased to exist.

The name I've heard mentioned besides these previous MQ alumni is Elasticsearch, a relatively new start-up. Elasticsearch, based on Apache Lucene, recently had a huge round of investment by some A-List VCs. What's the deal, Gartner?

Before I share my opinion, I have to reiterate that, until recently, I was an employee of Lucidworks, which many people see as a competitor to Elasticsearch. I believe my opinions are valid here, and I believe I’m known for being vendor-neutral. I think the best search platform for a given environment is a function of the platform and the environment – what data source, security, management and budget apply for any given company or department. “Search engine mismatch’ is a real problem and we’ve written about it for years.

Given that caveat, I believe I’m accurately describing the situation, and I encourage you to leave a comment if you think I've lost my objectivity!

OK, here goes. I don't believe Elasticsearch is in the enterprise search space. For that reason, if for no other, it doesn’t belong on the Gartner Magic Quadrant for search.

You heard it here. It's not that I don't think Elasticsearch isn’t a powerful, cool, and valuable tool. It is all that, and more. As I mentioned, it’s based on Apache Lucene, a fantastic embedded search tool. In fact, it's the same tool Solr (and therefore Lucidworks' commercial products) are based on.  But Lucene by itself is a tool more than a solution for enterprise search.

Let me start by addressing what I think Elasticsearch is great for: search-enabled data visualization. The first time I attended an Elasticsearch meet-up, they were showing the product in conjunction with two other open source projects: Logstash and Kibana. The total effect was great and made for a fantastic demo! I was fully and completely impressed, and saw the value immediately - search driving a visualization tool that was engaging, interactive, and exciting! 

Since then, Elasticsearch has apparently hired the guys who created those two respective open source projects, and has now morphed into a log analytics company - more like Splunk with great presentation capability, and less like traditional enterprise search. Their product is ELK - Elasticsearch Logstash Kibana. You can download all of these from GitHub, by the way.

(Lucidworks has also seen the value of Kibana to enterprise search, and has released their own version of Logstash and Kibana integrated with Solr called SiLK (Solr-Integrated Logstash and Kibana).

Now let me tell you why I do not think of Elasticsearch as an enterprise search solution. First, in my time at Lucid, I'm not aware of any enterprise opportunities that Lucidworks lost to ELK. I could be wrong, and maybe the Elastic guys know of many deals we never saw at Lucid. But with no crawler and other components I consider ‘required’ as part of an enterprise search product, I'm not sure they're interested - yet, at least.

Next, check the title of their home page: "Open Source Distributed Real Time Search". Doesn't scream 'Google Search Appliance replacement', does it? Read Elasticsearch founder Shay Banon on the GSA.

Finally, Wired Magazine has an even more interesting quote: Shay Banon on SharePoint. “We're not doing enterprise search in the traditional sense. We're not going to index SharePoint documents”.

Now, with the growth and the money Elasticsearch has, they may change their tune. But with over $100M in venture capital now, I think their investors are valuing Elasticsearch as a Splunk competitor, and perhaps a NoSQL search product for Hadoop - not a traditional enterprise search engine. 

So the real question is: which space are you in? Enterprise Search with SharePoint and other legacy data sources? Web content and file shares you need a crawler for? Is LDAP or Active Directory security important to you? Well - I won't say 'no way' - but I'd want to see it before I buy.

Do you use Elasticsearch for your enterprise search? Let me hear from you!




August 21, 2014

More on the Gartner MQ: Fact or fiction?

There is a lively discussion going on over in the LinkedIn ‘Enterprise Search Engine Professionals’ group about the recent Gartner Magic Quadrant report on Enterprise Search. Whit Andrews, a Gartner Research VP, has replied that the Gartner MQ is not a 'pay to play'. I confess guilt to have been the one who brought the topic up in these threads, at least, and I certainly thank Whit for clarifying the misunderstanding directly.

That said, two of my colleagues who are true search experts have raised some questions I thought should be addressed.

Charlie Hull of UK-based Flax says he's “unconvinced of the value of the MQ to anyone wanting a comprehensive … view of the options available in the search market'. And Otis Gospodnetić of New York-based Sematext asks "why (would) anyone bother with Gartner's reports. We all know they don't necessarily match the reality". I want to try to address those two very good points.

First, I'm not sure Gartner claims to be a comprehensive overview of the search market. Perhaps there are more thorough lists- my friends and colleagues Avi Rappoport and Steve Arnold both have more complete coverage. Avi, now at Search Technologies, still maintains with a list that is as much a history of search as a list of vendors. And Steve Arnold has a great deal of free content on his site as well as high quality technology overviews by subscription. Find links to both at

Nonetheless, Gartner does have published criterion, and being a paid subscriber is not one of them. His fellow Gartner analyst French Caldwell calls that out on his blog. By the way, I have first-hand experience that Gartner is willing to cut some slack to companies that don't quite meet all of their guidelines for inclusion, and I think that adds credence to the claim that everything.

A more interesting question is one that Otis raises: “why would anyone bother with Gartner's reports”?

To answer that, let me paraphrase a well-known quote from the early days of computers: "No one ever got fired for following Gartner's advice". They are well known for having good if not perfect advice - and I'd suspect that in the fine print, Gartner even acknowledges the fallibility of their recommendations. And all of us know that in real life, you can't select software as complex as an enterprise search platform without a proof of concept in your environment and on your content.

The industry is full of examples where the *best* technology loses pretty consistently to 'pretty good' stuff backed by a major firm/analyst/expert. Otis, I know you're an expert, and I'd take what you say as gospel. A VP at a big corporation who is not familiar with search (or his company's detailed search requirements) may not do so. And any one on that VP's staff who picks a platform based solely on what someone like you or I say probably faces some amount of career risk. That said, I think I speak for Otis and Charlie and others when I say I am glad that a number of folks have listened to our advice and are still fully employed!]

So - in summary, I think we're all right. Whit Andrews and Gartner provide advice that large organizations trust because of the overall methodology of their evaluation. Everyone does know it's not infallible, so a smart company will use the 'trust but verify' approach. And they continue to trust you and I, but more so when Gartner or Forrester or one of the large national consulting companies conforms our recommendation. And of not, we have to provide a compelling reason why something else is better for them. And the longer we're successful with out clients, the more credible we become.



August 05, 2014

The unspoken "search user contract"

Search usability is a major difference between search that works and search that sucks. 

I recently had lunch with my longtime friend and associate Avi Rappoport from Search Tools Consulting. We had a great time exchanging stories about some of the search problems our clients have. She mentioned one customer who she was sharing best practices when laying out a result list. That brought to mind what I've called the 'search user contract', which users tacitly expect when they use your search on any site, internal or external.

If you are responsible for an instance of search running inside a firewall, even if it's outward facing, you have a problem your predecessors of 15 to 20 years ago* didn't have. Back then, most users didn't have experience with search except the one you provided - so they didn't have expectations of what it could be like.

Fast forward to the present. In addition to your intranet search, virtually everyone in your organization knows, uses, and often loves Amazon, Facebook, Google, Apple, eBay, and others. They know what really great search looks like. They expect you to suggest searches (or even products) on the fly! Search today knows misspelled words and what other products you might like. And as we start to see more machine learning in the enterprise space, it will get even harder.

But most importantly, almost all of the above sites follow the same unspoken user contract:

  • On the result list, the search box goes at the top, either across a wide swath of the browser window or in a smaller box on the left-hand side, near the top.
  • There is no more than one search box on the results page.
  • Search results, numbered or not, show a page title or product name and description and a meaningful description of the product or summary of the document. Sometimes the summary is just a snippet.
  • Words that cause the document to be returned are sometimes bolded in the summary.
  • Suggestions for the words and phrases you type show up just below the search box (or up in the URL field)
  • Facets, when available, go along the left-hand side and/or across the top, just under the search box. Occasionally they can be on the right of the result list.
  • Whether facets are displayed on the left or right of the screen, the numbers next to each facet indicate how many results will display when that facet is clicked.
  • Best bets and boosted or promoted results show up at the top of the result list and are generally recognizable as recommended or featured results.
  • Advertisements or special announcements appear on the right side of the result list.
  • Links to the 'next’ or ‘previous' results page appear at the bottom or less often at the top of the result list.
  • Generally, when there is very long result list, there may be a limited number of results per page with a 'Next" and "Previous" links. 

Now it's time to look your web sites - public facing as well as behind your firewall. Things we often see on internal or corporate sites include:

  • Spelling suggestions in small, dark font very close to the site background color, at the left edge of the content, just above facets. Users don't expect to look there for suggestions, and even if they do look, make the color stand out so users see it**. Don't make the user think!
  • An extra search form on the page; one at the top as 'part of our standard header block'; and one right above the result list to enable drill down. The results you see will differ depending on which field to type in. [The visitor is confused: which search button should be pressed to do a 'drill down' search. Again, don't make the user think]
  • Tabs for drilling into different content areas seem to be facets, but some of the tabs ('News") have no results. [Facets should only display if, by clicking on a facet, the user can see more content]
  • As I said at the top, we’ve found poor search user experience is a major reason employees and site visitors report that ‘search sucks’. One of the standard engagements we do is a Search Audit, which includes search usability in addition to a review of user requirement and expectations.  




*Yes, Virginia, there was enterprise search 20 or more years ago. Virtually none of those names still exist, but their technology is still touching you every day. Fulcrum, Verity, Excalibur and others were solving problems for corporations and government agencies; and of course Yahoo was founded in 1994.

**True story, with names omitted to protect the innocent. On a site where I was asked to deliver a search quality audit, ‘spelling suggestions’ was a top requested feature. They actually had spell suggestions, in grey letters in a dark black field with a dark green background, far to the left of the browser window. No one noticed them. You know you are; you’re welcome!


August 04, 2014

Explaining the 2014 Gartner Enterprise Search MQ

The recent release of the Gartner 2014 MQ for Enterprise Search held a number of surprises, some of which I mentioned in my original post last week. My initial reaction was that at least a couple of the companies new to the MQ this year don't really strike me as 'enterprise search' companies. But as I dig into the MQ text, I can see some logic to their call. I can also see something else in the 2010 Enterprise search MQ: Search is going into a somewhat boring mid-life crisis.

Just a handful of years ago the field was vibrant. We had leaders FAST, Exalead, Endeca, ISYS, Open text, Omniture and more, none of which have survived to the 2014 MQ. Some were acquired; some faded away into different industries. But of the big names of the recent past, only Coveo, Autonomy and Google survive as 'Leaders'. Mark Logic, Lucidworks, Attivio, IBM and Mark Logic fall into the lower half of the 'Ability to Execute' axis.

What's going on here? "Big data" is the new kid on the block, a new toy to play with, and search is struggling with an uphill battle with the Apache Zoo-related tools (as well as new ones that seem to be announced daily.) Some companies, notably Lucidworks, are doing quite a bit to optimize search of content stored in Hadoop repositories with the natural language interface you expect from web search engines. They also have focused on tools to speed indexing currently in their Lucidworks Search commercial product but which will likely find it way into open source. 

My take? Don't schedule the wake for enterprise search just yet. Its newer, younger and fresher cousins in big data are getting all the hype for now; but eventually, users have to be able to find content in a way they understand - not SQL, but natural language queries typed by mere mortals on the 9 to 5 shift.

Need advice on your search solution? I'm happy to provide a one-hour consult on your enterprise search questions. Let me know.

July 29, 2014

Big data: Salvation for enterprise search?

Or just another data source?

With all the acquisitions we've seen in enterprise search in the last several years, it's no wonder that the field looks boring to the casual observer. Most companies have gone through two or more complex, costly search implementations to a new search platform, users still complain, and in some quarters, there seems to be 'quality of search fatigue'. I acknowledge I'm biased, but I think enterprise search implemented and managed properly provides incredible value to corporations, their employees, and their customers/consumers. That said, a lot of companies seem to treat search as 'fire and forget', and after it's installed, it never gets the resources to get off the ground in a quality way.

It's no surprise then that the recent hype bubble in 'Big Data' has the attention of enterprise search companies as they see a way to convince an entirely new group of technologists that search is the way. 

It's certainly true that Hadoop's beginning was related to search - as a repository for web crawler Nutch in preparation for highly scalable indexing in Lucene/Solr no doubt. Hadoop and its zoo* of related tools certainly are designed for nerds. At best, it's a framework that sits on top of your physical disks; worst case it's a file system that does support authentication but not really security (in the LDAP/AFD sense). And it's a great tool to write 'jobs' to manipulate content in interesting ways to a data scientist. How is your Java? Python? Clojure? Better brush up.

The enterprise search vendors of the world certainly see the tremendous interest in Hadoop and 'big data' as a great opportunity to grow their business. And for the right use cases, most enterprise search platforms can address the problem. But remember that, to enterprise search, the content you store in Hadoop is simply content in a different repository: a new data source on the menu.

But remember, big data apps come with all the same challenges enterprise search has faced for years plus a few more. Users - even if data scientists and researchers - think web-based Google is search; and even though - as a group - this demographic may be more intelligent than your average search users, they still expect your search to 'just know". If you think babysitting your existing enterprise search solution is touch, wait until you see what billions of documents does for you.

And speaking of billions of records - how long does your current search take to index your current content? How long does it take to do a full index of your current content? Now extrapolate: how long will it take to index a few billion records? (Note: some vendors can provide a much faster 'pipe' for indexing massive content from Hadoop. Lucidworks and Cloudera are two of the companies I am familiar with; there may be others)

A failure in search? Well, it depends what you want. If you are going to treat Hadoop as a 'Ha-Dump' with all of your log files, all of your customer transactional data, hundreds of Twitter feeds for ever and ever, and add your existing enterprise data, you're going to have some time on your hands while the data gets indexed.

On the other hand, if you're smart about where your data goes, break it into 'data lakes' of related content, and use the right tool for each type of data, you won't be using your enterprise search platform for use cases better served with analytics tools that are part of the Apache Zoo; and you’ll still be doing pretty well. And in that universe, Hadoop is just another data source for search - and not the slow pipe through which all of your data has to flow.

Do you agree?


*If you get the joke, chances are you know a bit about the Apache project and open source software. If not, you may want to hold off and research before you download your first Hadoop VM.   


July 22, 2014

The role of professional services in software companies

I just saw a great piece on LinkedIn written by venture investor Mark Suster (@msuster) of Upfront Ventures. His piece is about the importance of professional services to building a VC backed company/product. I can add no more to his take (bolded text is on me):

Have an in-house professional services team that implements your software. It will bring down your overall margins but will produce profitable revenue. Most importantly in ensures long-term success. I wrote about The Importance of Professional Services here. I know, I know. Your favorite investor told you this was a bad idea. Trust me – you’ll thank me a few years from now if you control your own destiny and improve quality through services. If your investor worked inside of a SaaS company for years and disagrees with me then listen to them. If they’re a spreadsheet jockey then on this particular issue I promise you they are FOS. Spreadsheet quant does not equal success, properly implemented software does.

'Nuff said. I've never met Mark, but I like where he's coming from!

/s/ Miles

July 21, 2014

Gartner MQ 2014 for Search: Surprise!

Funny, just last week I tweeted about how late the Gartner Magic Quadrant for Enterprise Search is this year. Usually it's out in March, and here it is, July.

Well, it's out - and boy does it have some surprises! My first take:

Coveo, a great search platform that runs on Windows only, is in the Leaders quadrant, and best overall in the "Completeness of Vision". Don't get me wrong, it's a great search platform; but I guess completeness of vision does not include completeness of platform. Linux your flavor? Sorry.

HP/Autonomy IDOL is in the upper right quadrant as well, back strong as the top in 'Ability to Execute' and in the top three on 'Completeness of Vision'. IDOL has always reminded me of the reliable old Douglas DC-3, described by aviation enthusiasts as 'a collection of parts flying in loose formation', but it really does offer everything enterprise search needs. And, because it loves big hardware, everything that HP loves to sell.

BA Insight surprised me with their Knowledge Integration Platform at the top of the Visionaries quadrant. It enhances Microsoft SharePoint Search, or runs with a stand-alone version of Lucene. It's very cool, yes. But I sure don't think of it as a search engine. Do you? More on this later.

Attivio comes in solid in the lower right 'Visionaries' quadrant. I'd really expected to see them further along on both measures, so I'm surprised.

I'm really quite disappointed that Gartner places my former employer Lucidworks solidly in the lower left 'Niche players' quadrant. I think Lucidworks has a very good vision of where they want to go, and I think most enterprises will find it compelling once they take a look. I don’t think I'm biased when I say that this may be Gartner's big miss this year. And OK, I understand that, like BA Insight's Knowledge product, Lucidworks needs a search engine to run, but it feels more like a true search platform.

Big surprise: IHS, which I have always thought as a publisher, has made it to the Gartner Niche quadrant as a search platform. Odd.

Other surprises: IBM in the Niche market quadrant, based on 'Ability to Execute'. Back at Verity, then CEO Philippe Courtot got the Gartner folks to admit that the big component of Ability to Execute was really about how long you could fund the project and I have to confess I figured IBM (and Google) as the MQ companies with the best cash position.

If you're not a Gartner client, I'm sorry you won't get the report or the insights Whit Andrews (@WhitAndrews _), a long time search analyst who knows his stuff. You can still find the report from several vendors happy to let you download the Gartner MQ Search from them. Search Google and find the link you most prefer, or call your vendor for a full copy.