The magic in early instances of what we now call 'enterprise search' was being able to find content by typing in a few keywords. It wasn't as cool as the HAL 9000 computer featured in "2001 - A Space Odyssey", but it was good enough to draw a large number of people - myself included - into the business.
Along the way, Google perfected a search platform based on the theory that, at scale, just about any query you could think of had already been used by thousands, if not millions. of other humans. All Google needed to do is keep track of what pages other humans viewed following a query and promoting the page to the top. Essentially, they created a 'crowd-sourced search'.
The bad news for those of us who work on search designed for use within the enterprise is that there just isn't sufficient content - or query activity - to deliver results as accurate as those we experience on the public web. Consider: Google marketed the Google Search Appliance for the enterprise. It didn't deliver the kinds of results public-facing Google does, and Google pulled the product from the market. For great search, size matters.
Nonetheless, some of the companies that market enterprise search products are now adding elements of machine learning with their products; and while perhaps not as accurate as web-based Google, they do deliver results that start out pretty well and get better with age, as the platforms learn what documents humans view following queries.
And if you've not noticed, some leading vendors are now integrating - and encouraging - what is known as 'conversational search'. Think about it: when you need to find a document in your organization, you may ask a colleague. But you don't simply say "sales'. Chances are you'll ask "where is the new sales report".
It's encouraging to see an increasing number of vendors delivering these capabilities in their commercial products. The most recent to announce conversational search is Algolia, although I have to say I'm quite disappointed in the Wikipedia write-up on them. In my spare time, should I ever find any, I should go do some edits, but this 'spare time' thing is rare for me.
Nonetheless, I'm happy to see an increasing number of commercial search vendors beginning to integrate these advanced capabilities into their products. Search in the enterprise has challenges: but hang in there: it's getting better!
Note: How has your experience been with machine learning and AI integrated with your enterprise search? I'd love to hear your experiences - even if under NDA!
VC firms seem attracted to the Enterprise Search space
Just today, it was announced that Canadian-based Coveo closed a ~170MUS round, following the Lucidworks’ recent $100MUS round. Earlier this year we’ve seen Algolia come in with $110M of funding, and of course the recent Elasticsearch’s IPO – sure looks it looks like 2019 will have been a good year for the leading technologies.
Lucidworks, the commercial organization with the largest pool of Solr committers, announced today a new funding round of $50M US from venture firms Top Tier Capital Partners and Silver Lake Waterman, as well as additional participation from existing investors Shasta Ventures, Granite Ventures, and Allegis Capital.
While a big funding round for a privately held company isn't uncommon here in 'the valley', what really caught my attention is where and how Lucidworks will use the new capital. Will Hayes, Lucidworks' CEO, intends to focus the investment on what he calls "smart data experiences" that go beyond simply artificial intelligence and machine learning. The challenge is to provide useful and relevant results by addressing what he calls "the last mile" problem in current AI: enabling mere mortals to find useful insights in search without having to understand the black art of data science and big data analysis. The end target is to drive better customer experiences and improved employee productivity.
A number of well-known companies utilize Lucidworks Fusion already, many along with AI and ML tools and technologies. I've long thought that to take advantage of 'big data' like Google, Amazon, and others do, you needed huge numbers of users and queries to confidently provide meaningful suggestions in search results. While that helps, Hayes explained that smaller organizations will be able to benefit from the technology in Fusion because of both smaller and more focused data sets, even with a smaller pool of queries. With the combination of these two characteristics, Lucidworks expects to deliver many of the benefits of traditional machine learning and AI-like results to enterprise-sized content. It will be interesting to see what Lucidworks does in the next several releases of Fusion!
Some of you may know that New Idea Engineering spun off a company called SearchButton.com in the mid-90s, offering Verity-powered site search for thousands of clients. Sadly, our investors insisted we violate the cardinal rule of business I learned at HP - "be profitable" - so when the "dot-com' bubble exploded, we were back to being New Idea Engineering again!
I remain a fan of hosted search to this day and have been pleasantly surprised to see companies like Algolia; Swiftype - now part of Elastic; and a few other "Search as a service" organizations reinventing the capabilities that we - along with a competitor Atomz> - offered more than 20 years ago! And I include with them the 'cloud-based' search services offed by other established enterprise search companies like Coveo, Microsoft, and until recently, Google.
That said, we've always strived to be fully vendor neutral when it comes to recommending products and services to our clients, and we go out of our way to understand and work with all of the major enterprise search vendors.
Over the last several months I've had the opportunity to use early releases of a product Lucidworks announced this morning: Lucidworks Site Search. As I said, I am a fan of hosted search - or 'search as a service'; and in full disclosure, I was a Lucidworks employee a few years back and yes, a shareholder.
I had an opportunity to talk with Will Hayes, Lucidworks' CEO, about Lucidworks' entry into the hosted search market. Even in its initial release, it looks pretty impressive.
First, Lucidworks Site Search is powered by the newest release of their enterprise product, Fusion 4.0, announced just last week and available for download. One of the exciting new capabilities in Fusion 4 is the full integration with Spark to enhance search with machine learning. It's not quite Google's "people like you" out of the box, but it's a giant step towards AI in the enterprise.
Fusion 4 also provides the ability to create, test, and move into production custom
'portable' search applications. When I first looked at the product last week, I confess to not having the vision to see just how powerful that capability is. It seems that the Fusion Site Search announced this morning is an example of a powerful, custom search app written specifically for site search.
But Lucid has great plans for their Site Search product. It can be run in the cloud, initially on AWS but soon expanding to other cloud services including Azure and Google. And reliability, you can elect to have Lucidworks Site Search span multiple data centers and even across multiple cloud services. As you'd expect in an enterprise product, it supports a wide variety of document formats, security, faceted navigation and a full management console. Finally, I understand that plans are in the works Lucidworks Site Search to be installed "on-prem" and even federate results (respecting document security) from the cloud and from your local instance at the same time.
Over the coming weeks and months I'll be writing more about Fusion 4, Lucid Site Search, and search apps. Stay tuned!
In the early days of what we now call 'enterprise search', there was no distinction between the search product and the underlying technology. Verity Topic ran on the Verity kernel and Fulcrum ran on the Fulcrum kernel, and that's the way it was - until recently.
In reality, writing the core of an enterprise search product is tough. It has to efficiently create an index of all the works in virtually any kind of file; it has to provide scalability to index millions of documents; and it has to respect document level security using a variety of protocols. And all of this has to deliver results in well under a second. And now, machine learning is becoming an expected capability as well. All for coding that no user will ever see.
Hosted search vendor Swiftype provides a rich search experience for administrators and for uses, but Elastic was the technology under the covers. And yesterday, Coveo announced that their popular enterprise search product will also be available with the Elastic engine rather than only with the existing Coveo proprietary kernel. This marks the start of a trend that I think may become ubiquitous.
Lucidworks, for example, is synonymous with Solr; but conceptually there is no reason their Fusion product couldn't run on a different search kernel - even on Elastic. However, with their investment in Solr, that does seem unlikely, especially with their ability to federate results from Elastic and other kernels with their App Studio, part of the recent Twigkit acquisition.
Nonetheless, Enterprise search is not the kernel: it's the capabilities exposed for the operation, management, and search experience of the product.
Of course, there are differences between Elastic and Coveo, for example, as well as with other kernels. But in reality, as long as the administrative and user experiences get the work done, what technology is doing the work under the covers matters only in a few fringe cases. And ironically, Elastic, like many other platforms, has its own potentially serious fringe conditions. At the UI level, solving those cases on multiple kernels is probably a lot less intense than managing and maintaining a proprietary kernel.
And this may be an opportunity for Coveo: until now, it's been a Cloud and Windows-only platform. This may mark their entry into multiple-platform environments.
If you’re involved in managing the enterprise search instance at your company, there’s a good chance that you’ve experienced at least some users complain about the poor results they see.
The common lament search teams hear is “Why didn’t we use Google?” when in fact, sites that implemented the GSA but don’t utilize the Google logo and look, we’ve seen the same complaints.
We're often asked to come in and recommend a solution. Sometimes the problem is simply using the wrong search platform: not every platform handles every user case and requirement equally well. Occasionally, the problem is a poorly or misconfigured search, or simply an instance that hasn’t been managed properly. Even the renowned Google public search engine doesn’t happen by itself, but even that is a poor example: in recent years, the Google search has become less of a search platform and more of a big data analytics engine.
Over the years, we’ve been helping clients select, implement, and manage Intranet search. In my opinion, the problem with search is elsewhere: Poor data quality.
Enterprise data isn’t created with search in mind. There is little incentive for content authors to attach quality metadata in the properties fields of Adobe PDF Maker, Microsoft Office, and other document publishing tools. To make matters worse, there may be several versions of a given document as it goes through creation, editing, reviews, and updates. And often the early drafts, as well as the final version, are in the same directory or file share. Very rarely does a public facing web site content have such issues.
Sometimes content management systems make it easy to implement what is really ‘search engine optimization’ or SEO; but it seems all too often that the optimization is left to the enterprise search platform to work out.
We have an updated two-part series on data quality and search, starting here. We hope you find it helpful; let us know if you have any questions!
In the belief that sharing my quick first impression will at least start a conversation going until I can write up a more complete analysis, I am going to share these first thoughts.
First, I am not wild about the new buzzterms 'cognitive search' and "insight engines". Yes, enterprise search can be intelligent, but it's not cognitive. which Webster defines as "of, relating to, or involving conscious mental activities (such as thinking, understanding, learning, and remembering)". HAL 9000 was cognitive software; "Did you mean" and "You might also like" are not cognition. And enterprise search has always provided insights into content, so why the new 'insight engines'?
Moving on, I agree with Forrester that Attivio, Coveo and Sinequa are among the leaders. Honestly, I wish Coveo was fully multi-platform, but they do have an outstanding cloud offering that in my mind addresses much of the issue.
However, unlike Forrester, I believe Lucidworks Fusion belongs right up there with the leaders. Fusion starts with a strong open source Solr-based core; an integrated administrative UI; a great search UI builder (with the recent acquisition of Twigkit); and multiple-platform support. (Yep, I worked there a few years ago, but well before the current product was created).
I count IDOL in with the 'Old Guard' along with Endeca, Vivisimo (‘Watson’) and perhaps others - former leaders still available, but offered by non-search companies, or removed from traditional enterprise search (Watson). And it will be interesting to see if Idol and its new parent, Microfocus, survive the recent shotgun wedding.
Tier 2, great search but notquite“full” enterprise search, includes Elastic (which I believe is in the enviable position as *the* platform for IoT), Mark Logic, and perhaps one or two more.
And there are several newer or perhaps less-well known search offerings like Algolia, Funnelback, Swiftype, Yippy and more. Don’t hold their size and/or youth against them; they’re quite good products.
No, I’d say the Forrester report is limited, and honestly a bit out of touch with the real enterprise search market. I know, I know; How do I really feel? Stay tuned, I've got more to say coming soon. What do you think? Leave a comment below!
Today Lucidworks announced the release Fusion 3, packed with some very powerful capabilities that, in many ways, sets a new standard in functionality and usability for enterprise search.
Fusion is tightly integrated Solr 6, the newest version of the popular, powerful and well-respected open source search platform. But the capabilities that really set Fusion 3 apart are the tools provided by Lucidworks on top of Solr to reduce the time-to-productivity.
It all starts at installation, which features a guided setup to allow staff, who may be not be familiar with enterprise search, to get started quickly and to built quality, full-featured search applications.
Earlier versions of Fusion provided very powerful ‘pipelines’ that allowed users to define a series of custom steps or 'stages' during both indexing and searching. These pipelines allowed users to add custom capabilities, but they generally required some programming and a deep understanding of search.
That knowledge still helps, but Fusion 3 comes with what Lucidworks calls the “Index Workbench” and the “Query Workbench”. These two GUI-driven applications let mere mortals set up capabilities that used to require a developer, and enables developers to create powerful pipelines in much less time.
What can a pipeline do? Let's look at two cases.
On a recent project, our client had a deep, well developed taxonomy, and they wanted to tag each document with the appropriate taxonomy terms. In the Fusion 2.x Index Pipeline, we wrote code to evaluate each document to determine relevant taxonomy terms; and then to insert the appropriate taxonomy terms into the actual document. This meant that at query time, no special effort was required to use the taxonomy terms in the query: they were part of the document.
Another common index time task is to identify and extract key terms, perhaps names and account numbers, to be used as facets.
The Index Workbench in Fusion 3 provides a powerful front-end to these capabilities that have long been part of Fusion; but which are now much easier for mere mortals to use.
The Query Workbench is similar, except that it operates at query time, making it easy to do what we’ve long called “query tuning”. Consider this: not every term a user enters for search is of equal important. The Query Workbench lets a non-programmer tweak relevance using a point-and-click interface. In previous visions of Fusion, and in most search platforms, a developer needed to write code to do the same task.
Another capability in Fusion 3 addresses a problem everyone who has ever installed a search technology has faced: how to insure that the production environment exactly mirrors the dev and QA servers. Doing so was a very detailed and tedious task; and any differences between QA and production could break something.
Fusion 3 has what Lucidworks calls Object Import/Export. This unique capability provides a way to export collection configurations, dashboards, and even pipeline stages and aggregations from a test or QA system; and reliably import those objects to a new production server. This makes it much easier to clone test systems; and more importantly, move search from Dev to QA and into production with high confidence that production exactly matches the test environment.
Fusion 3 also extends the Graphical Administrative User Interface to manage pretty much everything your operations department will need to do with Fusion. Admin UIs are not new; but the Fusion 3 tool sets a new high bar in functionality.
There is one other capability in Fusion 3 enabled by a relatively new capability in Solr: SQL.
I know what you’re thinking: “Why do I want SQL in a full-text application?”
Shift your focus to the other end.
Have you ever wanted to generate a report that shows information about inventory or other content in the search index? Let’s say on your business team needs inventory and product reports on content in your search-driven eCommerce data. The business team has tools they know and love for creating their own reports; but those tools operate on SQL databases.
This kind of reporting has always been tough in search, and typically required some customer programming to create the reports. With the SQL querying capabilities in Solr 6, and security provided by Fusion 3, you may simply need to point your business team at the search index, verify their credentials, and connect via OBDC/JDBC, and their existing tools will work.
What Else?
Fusion 3 is an upgrade from earlier versions, so it includes Spark, an Apache took with built-in modules for streaming, SQL, machine learning and graph processing. It works fine on Solr Cloud, which enables massive indices and query load; noit to mentin failover in the even of hardware problems.
I expect that Fusion 3 documentation, and the ability to download and evaluate the product, will be on the Lucidworks site today at www.lucidworks.com. “Try it, you’ll like it”.
While we here at New Idea Engineering, a Lucidworks partner, can help you evaluate and implement Fusion 3, I’d also point out that our friends at MC+A, also Lucidworks partners, are hosting a webinar Thursday, January 26th. The link this link to register and attend the webinar: http://bit.ly/2joopQK.
Lucidworks CTO Grant Ingersol will be hosting a webinar on Friday, February 1st. Read about it here.
Lucene was ‘born’ in 1999, created by Doug Cutting; and in 2005, it became a top-level Apache project. That year, Gartner Group announced that the search ‘Leaders’ platforms on their Enterprise Search Magic Quadrant included Autonomy, FAST, Endeca, IBM Omnifind, and Verity. The Google Search Appliance was right on the cusp between ‘Challengers’ and ‘Leaders’. Not many people knew about Lucene; and few who did saw it as much more than a quirky little project.
Just a year later, Yonik Seeley and his employer, CNET Networks, published and donated the Solr search server to the Apache Software Foundation, where it became an incubator project in 2006; the two projects soon merged into a single top-level Apache project. That same year, Gartner narrowed the ‘Leaders’ in their 2006 Magic Quadrant for Search to Autonomy (which acquired Verity the previous year), FAST, and Endeca.
Jump forward to the present. FAST is gone, acquired by Microsoft in 2008 and morphed into SharePoint Search. Hewlett-Packard acquired Autonomy in October of 2011, followed a few weeks later by Oracle’s acquisition of Endeca. Endeca is no longer available as a search platform; and Autonomy is mostly seen as a strategy to keep a large number of HP consultants fully employed, often on compliance applications.
Only a spattering of commercial enterprise search platforms that once flooded the market just a few years back exist any more. While Gartner continues to list 14 or 15 products in their Magic Quadrant Enterprise Search grid, about the only pure commercial products we see any more are the Google Search Appliance and Recommind. And Google recently announced that the appliance is scheduled to go ‘end of life’ over the next few years. All of those bright yellow boxes become really nice Dell servers by the end of 2018.
A new crop of search platforms has grown to fill the void.
As an open source product, Solr has grown in its capabilities, and is now widely used for enterprise search and data applications in corporations and government projects. Solr Cloud extends the platform to a scalable high-availability platform for demanding enterprise and data search applications. Solr is an open source solution.
Cloudera also bundles some interesting extra tools including Solr in their HUE bundle; free to download and free to use as long as you like. Cloudera runs a slightly older but stable release, 4.10; but with a committers Yonik Seeley and Mark Miller, I suspect they’re in a good position.
Hortonworks, a Cloudera competitor, also offers Solr/Solr Cloud in their releases, in partnership with Lucidworks - a company with a large number of committers on staff.
There are also three companies that have proprietary offerings based on open source technology.
Attivio, founded in 2007, is a “Leader” in the most recent Gartner Magic Quadrant for Enterprise Search. Their product, while not open source, nonetheless thrives by combining search, BI, data automation, analytics and more.
Elasticsearch has evolved into a strong platform for search and data analytics, and a number of organizations are finding it useful in some tradition enterprise search applications as well. Elastic has also integrated Kibana, a powerful graphical presentation tool that adds value for content analytics, not just search activity reporting.
Lucidworks Fusion is a relative newcomer to enterprise search. It includes many of the rich architectural features that enterprises expect, including a powerful crawler, connectors, and reporting. With its ‘Anda’ crawler and connectors, admin UI, and reporting, some people see it as a contender to replace the Google Search Appliance.
The one thing that all of these ‘proprietary’ products have in common? They are based on Apache Lucene to deliver critical functionality. And when you consider all of the web sites that use some form of Lucene for their site search, I think you'd agree that it really is a powerful little package. It’s available for virtually any operating systems, and can be integrated using just about any programming language from C/C++ to Java to Perl to Python to .NET.
Even more amazing is that these companies with commercial products based on Lucene – and who compete in the marketplace - actually cooperate when it comes time to fix bugs or add new capabilities to Lucene. Given all of the commercial players that have closed their doors - leaving their customers to find replacement platforms – we’ve reached the point where open-source-based software really is the safe choice now. And universally, Lucene is the common element.
The quirky little search API Doug Cutting put together in 1999 has evolved to be the platform that drives the leading search platforms used in big data, NoSQL, enterprise search, and search analytics. And it doesn’t seem like it’s going to be phasing out any time soon.
Would you find it helpful to benchmark your Enterprise Search operations against hundreds of corporations, organizations and government agencies worldwide? Before you answer, would you find that information useful enough that you’re spend a few minutes answering a survey about your enterprise search practices? It seems like a pretty good deal to me to have real-world data from people just like yourself worldwide.
This survey, the results of which are useful, insightful, and actionable for search managers everywhere, provides the insight into many of the critical areas of search.
Findwise, the Swedish company with offices there and in Denmark, Norway Poland, Norway and London, is gathering data now for the 2016 version of their annual Enterprise Search and Findability Survey at http://bit.ly/1sY9qiE.
What sorts of things will you learn?
Past surveys give insight into the difference between companies will happy search users versus those whose employees prefer to avoid using internal search. One particularly interesting finding last year was that there are three levels of ‘search maturity’, identifiable by how search is implemented across content.
The least mature search organizations, roughly 25% of respondents, have search for specific repositories (siloes), but they generally treat search as ‘fire and forget’, and once installed, there is no ongoing oversight.
More mature search organizations that represent about 60% of respondents, have one search for all silos; but maintaining and improving search technology has very little staff attention.
The remaining 15% of organizations answering the survey invest in search technology and staff, and continuously attempt to improve search and findability. These organizations often have multiple search instances tailored for specific users and repositories.
One of my favorite findings a few years back was that a majority of enterprises have “one or less” full time staff responsible for search; and yet a similar majority of employees reported that search just didn’t work. The good news? Subsequent surveys have shown that staffing search with as few as 2 FTEs improves overall search satisfactions; and 3 FTEs seem to strongly improve overall satisfaction. And even more good news: Over the years, the trend in enterprise search shows that more and more organizations are taking search and findability seriously.
You can participate in the 2016 Findwise Enterprise Search and Findability Survey in just 10 or 15 minutes and you’ll be among the first to know what this year brings. Again, you’ll find the 2016 survey at http://bit.ly/1sY9qiE.