7 posts categorized "Machine Learning"

January 14, 2020

Conversational Search

The magic in early instances of what we now call 'enterprise search' was being able to find content by typing in a few keywords. It wasn't as cool as the HAL 9000 computer featured in "2001 - A Space Odyssey", but it was good enough to draw a large number of people - myself included - into the business.

Along the way, Google perfected a search platform based on the theory that, at scale, just about any query you could think of had already been used by thousands, if not millions. of other humans. All Google needed to do is keep track of what pages other humans viewed following a query and promoting the page to the top. Essentially, they created a 'crowd-sourced search'. 

The bad news for those of us who work on search designed for use within the enterprise is that there just isn't sufficient content - or query activity - to deliver results as accurate as those we experience on the public web. Consider: Google marketed the Google Search Appliance for the enterprise. It didn't deliver the kinds of results public-facing Google does, and Google pulled the product from the market. For great search, size matters.

Nonetheless, some of the companies that market enterprise search products are now adding elements of machine learning with their products; and while perhaps not as accurate as web-based Google, they do deliver results that start out pretty well and get better with age, as the platforms learn what documents humans view following queries.

And if you've not noticed, some leading vendors are now integrating - and encouraging - what is known as 'conversational search'. Think about it: when you need to find a document in your organization, you may ask a colleague. But you don't simply say "sales'. Chances are you'll ask "where is the new sales report".

It's encouraging to see an increasing number of vendors delivering these capabilities in their commercial products.  The most recent to announce conversational search is Algolia, although I have to say I'm quite disappointed in the Wikipedia write-up on them. In my spare time, should I ever find any,  I should go do some edits, but this 'spare time' thing is rare for me.

Nonetheless, I'm happy to see an increasing number of commercial search vendors beginning to integrate these advanced capabilities into their products. Search in the enterprise has challenges: but hang in there: it's getting better! 

Note: How has your experience been with machine learning and AI integrated with your enterprise search? I'd love to hear your experiences - even if under NDA!

 

November 14, 2018

Do you have big data? Or just lot of it?

Of course, I’m a search nerd. I've been involved in enterprise search for over 20 years. I see search and big data as related technologies, but in most cases, I do not see them as synonymous.

And I'd also say that, while most enterprises have a lot of data, the term ‘big data’ is not applicable to most organizations.

Consider that Google (and others) define ‘big data’ as “extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions”.

Yes, the data that Amazon, Google, Facebook, and others collect qualifies as big data. These companies mine everything you do when you're using their sites. Amazon wants to be able to report “people like you bought …” to sell more product; Google wants to know what ‘people like you’ look at after a query so they can suggest it to the next person like you; Facebook.. well, they want to know what to try to sell you as you chat with and about your friends. Is search involved? Maybe; but more often some strong machine learning and internal analytics are key.

Do consulting firms like Ernst & Young or PWC have big data? Well, my bet is they have alot of information about their clients, business practices, accounting, etc.. but is it ‘big data’? Probably not.

Solr, Elastic and other search technologies can search-enable huge sets of data, so often big data is indexed to be searchable by humans. And both Solr and Elastic come with some great analytical tools.. Kibana on Elastic, and Banana, the port of Kibana for Solr based engines.

But again, is that big data Or just lots of it?

I’d vote the latter.

 

May 03, 2018

Lucidworks expands focus in new funding round

Lucidworks, the commercial organization with the largest pool of Solr committers, announced today a new funding round of $50M US from venture firms Top Tier Capital Partners and Silver Lake Waterman, as well as additional participation from existing investors Shasta Ventures, Granite Ventures, and Allegis Capital.

While a big funding round for a privately held company isn't uncommon here in 'the valley', what really caught my attention is where and how Lucidworks will use the new capital. Will Hayes, Lucidworks' CEO, intends to focus the investment on what he calls "smart data experiences" that go beyond simply artificial intelligence and machine learning. The challenge is to provide useful and relevant results by addressing what he calls "the last mile" problem in current AI:  enabling mere mortals to find useful insights in search without having to understand the black art of data science and big data analysis. The end target is to drive better customer experiences and improved employee productivity.

A number of well-known companies utilize Lucidworks Fusion already, many along with AI and ML tools and technologies. I've long thought that to take advantage of 'big data' like Google,  Amazon, and others do, you needed huge numbers of users and queries to confidently provide meaningful suggestions in search results.  While that helps, Hayes explained that smaller organizations will be able to benefit from the technology in Fusion because of both smaller and more focused data sets, even with a smaller pool of queries. With the combination of these two characteristics, Lucidworks expects to deliver many of the benefits of traditional machine learning and AI-like results to enterprise-sized content. It will be interesting to see what Lucidworks does in the next several releases of Fusion!

March 06, 2018

Lucidworks Announces Search as a Service

Not Your Grandfather's Site Search

Some of you may know that New Idea Engineering spun off a company called SearchButton.com in the mid-90s, offering Verity-powered site search for thousands of clients. Sadly, our investors insisted we violate the cardinal rule of business I learned at HP - "be profitable" - so when the "dot-com' bubble exploded, we were back to being New Idea Engineering again!

I remain a fan of hosted search to this day and have been pleasantly surprised to see companies like Algolia; Swiftype - now part of Elastic; and a few other "Search as a service" organizations reinventing the capabilities that we - along with a competitor Atomz> - offered more than 20 years ago! And I include with them the 'cloud-based' search services offed by other established enterprise search companies like Coveo, Microsoft, and until recently, Google.

That said, we've always strived to be fully vendor neutral when it comes to recommending products and services to our clients, and we go out of our way to understand and work with all of the major enterprise search vendors.

Over the last several months I've had the opportunity to use early releases of a product Lucidworks announced this morning: Lucidworks Site Search. As I said, I am a fan of hosted search - or 'search as a service'; and in full disclosure, I was a Lucidworks employee a few years back and yes, a shareholder.

I had an opportunity to talk with Will Hayes, Lucidworks' CEO, about Lucidworks' entry into the hosted search market. Even in its initial release, it looks pretty impressive.

First, Lucidworks Site Search is powered by the newest release of their enterprise product, Fusion 4.0, announced just last week and available for download. One of the exciting new capabilities in Fusion 4 is the full integration with Spark to enhance search with machine learning. It's not quite Google's "people like you" out of the box, but it's a giant step towards AI in the enterprise.

Fusion 4 also provides the ability to create, test, and move into production custom 'portable' search applications. When I first looked at the product last week, I confess to not having the vision to see just how powerful that capability is. It seems that the Fusion Site Search announced this morning is an example of a powerful, custom search app written specifically for site search.

But Lucid has great plans for their Site Search product. It can be run in the cloud, initially on AWS but soon expanding to other cloud services including Azure and Google. And reliability, you can elect to have Lucidworks Site Search span multiple data centers and even across multiple cloud services. As you'd expect in an enterprise product, it supports a wide variety of document formats, security, faceted navigation and a full management console. Finally, I understand that plans are in the works Lucidworks Site Search to be installed "on-prem" and even federate results (respecting document security) from the cloud and from your local instance at the same time.

Over the coming weeks and months I'll be writing more about Fusion 4, Lucid Site Search, and search apps. Stay tuned!

February 22, 2018

Search Is the User Experience, not the kernel

In the early days of what we now call 'enterprise search', there was no distinction between the search product and the underlying technology. Verity Topic ran on the Verity kernel and Fulcrum ran on the Fulcrum kernel, and that's the way it was - until recently.

In reality, writing the core of an enterprise search product is tough. It has to efficiently create an index of all the works in virtually any kind of file; it has to provide scalability to index millions of documents; and it has to respect document level security using a variety of protocols. And all of this has to deliver results in well under a second. And now, machine learning is becoming an expected capability as well. All for coding that no user will ever see.

Hosted search vendor Swiftype provides a rich search experience for administrators and for uses, but Elastic was the technology under the covers. And yesterday, Coveo announced that their popular enterprise search product will also be available with the Elastic engine rather than only with the existing Coveo proprietary kernel. This marks the start of a trend that I think may become ubiquitous.  

Lucidworks, for example, is synonymous with Solr; but conceptually there is no reason their Fusion product couldn't run on a different search kernel - even on Elastic. However, with their investment in Solr, that does seem unlikely, especially with their ability to federate results from Elastic and other kernels with their App Studio, part of the recent Twigkit acquisition.

Nonetheless, Enterprise search is not the kernel: it's the capabilities exposed for the operation, management, and search experience of the product.

Of course, there are differences between Elastic and Coveo, for example, as well as with other kernels. But in reality, as long as the administrative and user experiences get the work done, what technology is doing the work under the covers matters only in a few fringe cases. And ironically, Elastic, like many other platforms, has its own potentially serious fringe conditions. At the UI level, solving those cases on multiple kernels is probably a lot less intense than managing and maintaining a proprietary kernel.

And this may be an opportunity for Coveo: until now, it's been a Cloud and Windows-only platform. This may mark their entry into multiple-platform environments.

June 22, 2017

First Impressions on the new Forrester Wave

The new Forrester Wave™: Cognitive Search And Knowledge Discovery Solutions is out, and once again I think Forrester, along with Gartner and others, miss the mark on the real enterprise search market. 

In the belief that sharing my quick first impression will at least start a conversation going until I can write up a more complete analysis, I am going to share these first thoughts.

First, I am not wild about the new buzzterms 'cognitive search' and "insight engines". Yes, enterprise search can be intelligent, but it's not cognitive. which Webster defines as "of, relating to, or involving conscious mental activities (such as thinking, understanding, learning, and remembering)". HAL 9000 was cognitive software; "Did you mean" and "You might also like" are not cognition.  And enterprise search has always provided insights into content, so why the new 'insight engines'? 

Moving on, I agree with Forrester that Attivio, Coveo and Sinequa are among the leaders. Honestly, I wish Coveo was fully multi-platform, but they do have an outstanding cloud offering that in my mind addresses much of the issue.

However, unlike Forrester, I believe Lucidworks Fusion belongs right up there with the leaders. Fusion starts with a strong open source Solr-based core; an integrated administrative UI; a great search UI builder (with the recent acquisition of Twigkit); and multiple-platform support. (Yep, I worked there a few years ago, but well before the current product was created).

I count IDOL in with the 'Old Guard' along with Endeca, Vivisimo (‘Watson’) and perhaps others - former leaders still available, but offered by non-search companies, or removed from traditional enterprise search (Watson). And it will be interesting to see if Idol and its new parent, Microfocus, survive the recent shotgun wedding. 

Tier 2, great search but not quite “full” enterprise search, includes Elastic (which I believe is in the enviable position as *the* platform for IoT), Mark Logic, and perhaps one or two more.

And there are several newer or perhaps less-well known search offerings like Algolia, Funnelback, Swiftype, Yippy and more. Don’t hold their size and/or youth against them; they’re quite good products.

No, I’d say the Forrester report is limited, and honestly a bit out of touch with the real enterprise search market. I know, I know; How do I really feel? Stay tuned, I've got more to say coming soon. What do you think? Leave a comment below!

January 25, 2017

Lucidworks 3 Released!

Today Lucidworks announced the release Fusion 3, packed with some very powerful capabilities that, in many ways, sets a new standard in functionality and usability for enterprise search.

Fusion is tightly integrated Solr 6, the newest version of the popular, powerful and well-respected open source search platform. But the capabilities that really set Fusion 3 apart are the tools provided by Lucidworks on top of Solr to reduce the time-to-productivity.

It all starts at installation, which features a guided setup to allow staff, who may be not be familiar with enterprise search, to get started quickly and to built quality, full-featured search applications.

Earlier versions of Fusion provided very powerful ‘pipelines’ that allowed users to define a series of custom steps or 'stages' during both indexing and searching. These pipelines allowed users to add custom capabilities, but they generally required some programming and a deep understanding of search.

That knowledge still helps, but Fusion 3 comes with what Lucidworks calls the “Index Workbench” and the “Query Workbench”. These two GUI-driven applications let mere mortals set up capabilities that used to require a developer, and enables developers to create powerful pipelines in much less time.

What can a pipeline do? Let's look at two cases.

On a recent project, our client had a deep, well developed taxonomy, and they wanted to tag each document with the appropriate taxonomy terms. In the Fusion 2.x Index Pipeline, we wrote code to evaluate each document to determine relevant taxonomy terms; and then to insert the appropriate taxonomy terms into the actual document. This meant that at query time, no special effort was required to use the taxonomy terms in the query: they were part of the document.

Another common index time task is to identify and extract key terms, perhaps names and account numbers, to be used as facets.

The Index Workbench in Fusion 3 provides a powerful front-end to these capabilities that have long been part of Fusion; but which are now much easier for mere mortals to use.

The Query Workbench is similar, except that it operates at query time, making it easy to do what we’ve long called “query tuning”. Consider this: not every term a user enters for search is of equal important. The Query Workbench lets a non-programmer tweak relevance using a point-and-click interface. In previous visions of Fusion, and in most search platforms, a developer needed to write code to do the same task.

Another capability in Fusion 3 addresses a problem everyone who has ever installed a search technology has faced: how to insure that the production environment exactly mirrors the dev and QA servers. Doing so was a very detailed and tedious task; and any differences between QA and production could break something.

Fusion 3 has what Lucidworks calls Object Import/Export. This unique capability provides a way to export collection configurations, dashboards, and even pipeline stages and aggregations from a test or QA system; and reliably import those objects to a new production server. This makes it much easier to clone test systems; and more importantly, move search from Dev to QA and into production with high confidence that production exactly matches the test environment.

Fusion 3 also extends the Graphical Administrative User Interface to manage pretty much everything your operations department will need to do with Fusion. Admin UIs are not new; but the Fusion 3 tool sets a new high bar in functionality.

There is one other capability in Fusion 3 enabled by a relatively new capability in Solr: SQL.

I know what you’re thinking: “Why do I want SQL in a full-text application?”

Shift your focus to the other end.

Have you ever wanted to generate a report that shows information about inventory or other content in the search index? Let’s say on your business team needs inventory and product reports on content in your search-driven eCommerce data. The business team has tools they know and love for creating their own reports; but those tools operate on SQL databases.

This kind of reporting has always been tough in search, and typically required some customer programming to create the reports. With the SQL querying capabilities in Solr 6, and security provided by Fusion 3, you may simply need to point your business team at the search index, verify their credentials, and connect via OBDC/JDBC, and their existing tools will work.

What Else?

Fusion 3 is an upgrade from earlier versions, so it includes Spark, an Apache took with built-in modules for streaming, SQL, machine learning and graph processing. It works fine on Solr Cloud, which enables massive indices and query load; noit to mentin failover in the even of hardware problems. 

I expect that Fusion 3 documentation, and the ability to download and evaluate the product, will be on the Lucidworks site today at www.lucidworks.com. “Try it, you’ll like it”.

While we here at New Idea Engineering, a Lucidworks partner, can help you evaluate and implement Fusion 3, I’d also point out that our friends at MC+A, also Lucidworks partners, are hosting a webinar Thursday, January 26th. The link this link to register and attend the webinar: http://bit.ly/2joopQK.

 

Lucidworks CTO Grant Ingersol will be hosting a webinar on Friday, February 1st. Read about it here.

 

/s/ Miles