3 posts categorized "Analytics"

November 14, 2018

Do you have big data? Or just lot of it?

Of course, I’m a search nerd. I've been involved in enterprise search for over 20 years. I see search and big data as related technologies, but in most cases, I do not see them as synonymous.

And I'd also say that, while most enterprises have a lot of data, the term ‘big data’ is not applicable to most organizations.

Consider that Google (and others) define ‘big data’ as “extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions”.

Yes, the data that Amazon, Google, Facebook, and others collect qualifies as big data. These companies mine everything you do when you're using their sites. Amazon wants to be able to report “people like you bought …” to sell more product; Google wants to know what ‘people like you’ look at after a query so they can suggest it to the next person like you; Facebook.. well, they want to know what to try to sell you as you chat with and about your friends. Is search involved? Maybe; but more often some strong machine learning and internal analytics are key.

Do consulting firms like Ernst & Young or PWC have big data? Well, my bet is they have alot of information about their clients, business practices, accounting, etc.. but is it ‘big data’? Probably not.

Solr, Elastic and other search technologies can search-enable huge sets of data, so often big data is indexed to be searchable by humans. And both Solr and Elastic come with some great analytical tools.. Kibana on Elastic, and Banana, the port of Kibana for Solr based engines.

But again, is that big data Or just lots of it?

I’d vote the latter.

 

June 28, 2017

Poor data quality gives search a bad rap

If you’re involved in managing the enterprise search instance at your company, there’s a good chance that you’ve experienced at least some users complain about the poor results they see. 

The common lament search teams hear is “Why didn’t we use Google?” when in fact, sites that implemented the GSA but don’t utilize the Google logo and look, we’ve seen the same complaints.

We're often asked to come in and recommend a solution. Sometimes the problem is simply using the wrong search platform: not every platform handles every user case and requirement equally well. Occasionally, the problem is a poorly or misconfigured search, or simply an instance that hasn’t been managed properly. Even the renowned Google public search engine doesn’t happen by itself, but even that is a poor example: in recent years, the Google search has become less of a search platform and more of a big data analytics engine.

Over the years, we’ve been helping clients select, implement, and manage Intranet search. In my opinion, the problem with search is elsewhere: Poor data quality. 

Enterprise data isn’t created with search in mind. There is little incentive for content authors to attach quality metadata in the properties fields of Adobe PDF Maker, Microsoft Office, and other document publishing tools. To make matters worse, there may be several versions of a given document as it goes through creation, editing, reviews, and updates. And often the early drafts, as well as the final version, are in the same directory or file share. Very rarely does a public facing web site content have such issues.

Sometimes content management systems make it easy to implement what is really ‘search engine optimization’ or SEO; but it seems all too often that the optimization is left to the enterprise search platform to work out.

We have an updated two-part series on data quality and search, starting here. We hope you find it helpful; let us know if you have any questions!

August 25, 2014

Is Elasticsearch really enterprise search?

Not too long ago, Gartner released it's the 2014 Magic Quadrant which I’ve written about here and which has generated a lively discussion on the Enterprise Search Engine Professionals group over on LinkedIn.

Much of the discussions I’ve seen about this year's MQ deals with the omission of several platforms that most people think of as 'enterprise search’. Consider that MQ alumni Endeca, Exalead, Vivisimo, Microsoft FAST, and others don’t even appear this year. Over the last few years larger companies acquired most of these players, but in the MQ it's as if they simply ceased to exist.

The name I've heard mentioned besides these previous MQ alumni is Elasticsearch, a relatively new start-up. Elasticsearch, based on Apache Lucene, recently had a huge round of investment by some A-List VCs. What's the deal, Gartner?

Before I share my opinion, I have to reiterate that, until recently, I was an employee of Lucidworks, which many people see as a competitor to Elasticsearch. I believe my opinions are valid here, and I believe I’m known for being vendor-neutral. I think the best search platform for a given environment is a function of the platform and the environment – what data source, security, management and budget apply for any given company or department. “Search engine mismatch’ is a real problem and we’ve written about it for years.

Given that caveat, I believe I’m accurately describing the situation, and I encourage you to leave a comment if you think I've lost my objectivity!

OK, here goes. I don't believe Elasticsearch is in the enterprise search space. For that reason, if for no other, it doesn’t belong on the Gartner Magic Quadrant for search.

You heard it here. It's not that I don't think Elasticsearch isn’t a powerful, cool, and valuable tool. It is all that, and more. As I mentioned, it’s based on Apache Lucene, a fantastic embedded search tool. In fact, it's the same tool Solr (and therefore Lucidworks' commercial products) are based on.  But Lucene by itself is a tool more than a solution for enterprise search.

Let me start by addressing what I think Elasticsearch is great for: search-enabled data visualization. The first time I attended an Elasticsearch meet-up, they were showing the product in conjunction with two other open source projects: Logstash and Kibana. The total effect was great and made for a fantastic demo! I was fully and completely impressed, and saw the value immediately - search driving a visualization tool that was engaging, interactive, and exciting! 

Since then, Elasticsearch has apparently hired the guys who created those two respective open source projects, and has now morphed into a log analytics company - more like Splunk with great presentation capability, and less like traditional enterprise search. Their product is ELK - Elasticsearch Logstash Kibana. You can download all of these from GitHub, by the way.

(Lucidworks has also seen the value of Kibana to enterprise search, and has released their own version of Logstash and Kibana integrated with Solr called SiLK (Solr-Integrated Logstash and Kibana).

Now let me tell you why I do not think of Elasticsearch as an enterprise search solution. First, in my time at Lucid, I'm not aware of any enterprise opportunities that Lucidworks lost to ELK. I could be wrong, and maybe the Elastic guys know of many deals we never saw at Lucid. But with no crawler and other components I consider ‘required’ as part of an enterprise search product, I'm not sure they're interested - yet, at least.

Next, check the title of their home page: "Open Source Distributed Real Time Search". Doesn't scream 'Google Search Appliance replacement', does it? Read Elasticsearch founder Shay Banon on the GSA.

Finally, Wired Magazine has an even more interesting quote: Shay Banon on SharePoint. “We're not doing enterprise search in the traditional sense. We're not going to index SharePoint documents”.

Now, with the growth and the money Elasticsearch has, they may change their tune. But with over $100M in venture capital now, I think their investors are valuing Elasticsearch as a Splunk competitor, and perhaps a NoSQL search product for Hadoop - not a traditional enterprise search engine. 

So the real question is: which space are you in? Enterprise Search with SharePoint and other legacy data sources? Web content and file shares you need a crawler for? Is LDAP or Active Directory security important to you? Well - I won't say 'no way' - but I'd want to see it before I buy.

Do you use Elasticsearch for your enterprise search? Let me hear from you!