November 14, 2018

Do you have big data? Or just lot of it?

Of course, I’m a search nerd. I've been involved in enterprise search for over 20 years. I see search and big data as related technologies, but in most cases, I do not see them as synonymous.

And I'd also say that, while most enterprises have a lot of data, the term ‘big data’ is not applicable to most organizations.

Consider that Google (and others) define ‘big data’ as “extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions”.

Yes, the data that Amazon, Google, Facebook, and others collect qualifies as big data. These companies mine everything you do when you're using their sites. Amazon wants to be able to report “people like you bought …” to sell more product; Google wants to know what ‘people like you’ look at after a query so they can suggest it to the next person like you; Facebook.. well, they want to know what to try to sell you as you chat with and about your friends. Is search involved? Maybe; but more often some strong machine learning and internal analytics are key.

Do consulting firms like Ernst & Young or PWC have big data? Well, my bet is they have alot of information about their clients, business practices, accounting, etc.. but is it ‘big data’? Probably not.

Solr, Elastic and other search technologies can search-enable huge sets of data, so often big data is indexed to be searchable by humans. And both Solr and Elastic come with some great analytical tools.. Kibana on Elastic, and Banana, the port of Kibana for Solr based engines.

But again, is that big data Or just lots of it?

I’d vote the latter.

 

August 09, 2018

Fake Search?

Enterprise search was once easy. It was often bad - but understanding the results was pretty easy. If the query term(s) were in the document, it was there in the results. Period. The more times the terms appeared, the higher the result appeared in the result list. And when the user typed a multi-term query, documents with all of the terms displayed higher in the result list than those with only some of the terms.

And some search platforms could 'explain' why a particular document was ranked where it was. Those of us who have been in the business a while may remember the Verity Topic and Verity K2 product lines. One of the most advanced capabilities in these was the 'explain' function. It would reverse engineer the score for an individual document and report the critical 'why' the selected document was ranked as it was.

"People Like You"

Now, search results are generally better, but every now and then, you’ll see a result that surprises you, especially on sites that are enhanced with machine learning technologies. A friend of mine tells of a query she did on Google while she was looking for summer clothes, but the top result was a pair of shoes. She related her surprise: "I asked for summer clothes, and Google shows me SHOES?".  But, she admitted, "Those shoes ARE pretty nice!"

How did that happen? Somewhere, deep in the data Google maintains on her search history, it concluded that "people like her" purchased that pair of shoes.

In the enterprise, we don't have the volume of content and query activity of the large Internet players, but we do tend to have more focused content and a narrower query vocabulary. ML/AI tools like MLLib, part of both Mahout and Spark, can help our search platforms generate such odd yet often relevant results; but these technologies are still limited when it comes to explaining the 'why' for a given result. And those of us who still exhibit skepticism when it comes to computers, that capability would be nice.

Are you using or planning to implement) ML-in-search? A skeptic? Which camp you're in? Let me hear from you! miles.kehoe@ideaeng.com.

May 03, 2018

Lucidworks expands focus in new funding round

Lucidworks, the commercial organization with the largest pool of Solr committers, announced today a new funding round of $50M US from venture firms Top Tier Capital Partners and Silver Lake Waterman, as well as additional participation from existing investors Shasta Ventures, Granite Ventures, and Allegis Capital.

While a big funding round for a privately held company isn't uncommon here in 'the valley', what really caught my attention is where and how Lucidworks will use the new capital. Will Hayes, Lucidworks' CEO, intends to focus the investment on what he calls "smart data experiences" that go beyond simply artificial intelligence and machine learning. The challenge is to provide useful and relevant results by addressing what he calls "the last mile" problem in current AI:  enabling mere mortals to find useful insights in search without having to understand the black art of data science and big data analysis. The end target is to drive better customer experiences and improved employee productivity.

A number of well-known companies utilize Lucidworks Fusion already, many along with AI and ML tools and technologies. I've long thought that to take advantage of 'big data' like Google,  Amazon, and others do, you needed huge numbers of users and queries to confidently provide meaningful suggestions in search results.  While that helps, Hayes explained that smaller organizations will be able to benefit from the technology in Fusion because of both smaller and more focused data sets, even with a smaller pool of queries. With the combination of these two characteristics, Lucidworks expects to deliver many of the benefits of traditional machine learning and AI-like results to enterprise-sized content. It will be interesting to see what Lucidworks does in the next several releases of Fusion!

April 23, 2018

Poor Data Quality gives Enterprise Search a Bad Rap

If you’re involved in managing the enterprise search instance at your company, there’s a good chance that you’ve experienced at least some users complaining about the poor results they see. A common lament search teams hear is “Why didn’t we use Google?” Even more telling is that many organizations that used the Google Search Appliance on their sites heard the same lament.

We're often asked to help a client improve results on an internal search platform; and sometimes, the problem is the platform. Not every platform handles every use case equally, and sometimes that shows up. Occasionally, the problem is a poor or misconfigured search, or simply an instance that hasn’t been managed properly. The renowned Google public search engine does well not because it is a great search platform. In fact, Google has become less of a search platform and more of a big data analytics engine.

Our business is helping clients select, implement, and manage Intranet search. Frequently, the problem is not the search platform. Rather, the culprit is poor data quality. 

Enterprise data isn’t created with search in mind. There is little incentive for authors to attach quality metadata in the properties fields Adobe PDF Maker, Microsoft Office, and other document publishing tools support. To make matters worse, there may be several versions of a given document as it goes through creation, editing, and updating; and often the early drafts, as well as the final version, are in the same directory or file share. Very rarely will a public facing website have such issues.

We have an updated two-part series on data quality and search, starting here. We hope you find it helpful; let us know if you have any questions!

March 06, 2018

Lucidworks Announces Search as a Service

Not Your Grandfather's Site Search

Some of you may know that New Idea Engineering spun off a company called SearchButton.com in the mid-90s, offering Verity-powered site search for thousands of clients. Sadly, our investors insisted we violate the cardinal rule of business I learned at HP - "be profitable" - so when the "dot-com' bubble exploded, we were back to being New Idea Engineering again!

I remain a fan of hosted search to this day and have been pleasantly surprised to see companies like Algolia; Swiftype - now part of Elastic; and a few other "Search as a service" organizations reinventing the capabilities that we - along with a competitor Atomz> - offered more than 20 years ago! And I include with them the 'cloud-based' search services offed by other established enterprise search companies like Coveo, Microsoft, and until recently, Google.

That said, we've always strived to be fully vendor neutral when it comes to recommending products and services to our clients, and we go out of our way to understand and work with all of the major enterprise search vendors.

Over the last several months I've had the opportunity to use early releases of a product Lucidworks announced this morning: Lucidworks Site Search. As I said, I am a fan of hosted search - or 'search as a service'; and in full disclosure, I was a Lucidworks employee a few years back and yes, a shareholder.

I had an opportunity to talk with Will Hayes, Lucidworks' CEO, about Lucidworks' entry into the hosted search market. Even in its initial release, it looks pretty impressive.

First, Lucidworks Site Search is powered by the newest release of their enterprise product, Fusion 4.0, announced just last week and available for download. One of the exciting new capabilities in Fusion 4 is the full integration with Spark to enhance search with machine learning. It's not quite Google's "people like you" out of the box, but it's a giant step towards AI in the enterprise.

Fusion 4 also provides the ability to create, test, and move into production custom 'portable' search applications. When I first looked at the product last week, I confess to not having the vision to see just how powerful that capability is. It seems that the Fusion Site Search announced this morning is an example of a powerful, custom search app written specifically for site search.

But Lucid has great plans for their Site Search product. It can be run in the cloud, initially on AWS but soon expanding to other cloud services including Azure and Google. And reliability, you can elect to have Lucidworks Site Search span multiple data centers and even across multiple cloud services. As you'd expect in an enterprise product, it supports a wide variety of document formats, security, faceted navigation and a full management console. Finally, I understand that plans are in the works Lucidworks Site Search to be installed "on-prem" and even federate results (respecting document security) from the cloud and from your local instance at the same time.

Over the coming weeks and months I'll be writing more about Fusion 4, Lucid Site Search, and search apps. Stay tuned!