183 posts categorized "Enterprise search"

November 14, 2018

Do you have big data? Or just lot of it?

Of course, I’m a search nerd. I've been involved in enterprise search for over 20 years. I see search and big data as related technologies, but in most cases, I do not see them as synonymous.

And I'd also say that, while most enterprises have a lot of data, the term ‘big data’ is not applicable to most organizations.

Consider that Google (and others) define ‘big data’ as “extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions”.

Yes, the data that Amazon, Google, Facebook, and others collect qualifies as big data. These companies mine everything you do when you're using their sites. Amazon wants to be able to report “people like you bought …” to sell more product; Google wants to know what ‘people like you’ look at after a query so they can suggest it to the next person like you; Facebook.. well, they want to know what to try to sell you as you chat with and about your friends. Is search involved? Maybe; but more often some strong machine learning and internal analytics are key.

Do consulting firms like Ernst & Young or PWC have big data? Well, my bet is they have alot of information about their clients, business practices, accounting, etc.. but is it ‘big data’? Probably not.

Solr, Elastic and other search technologies can search-enable huge sets of data, so often big data is indexed to be searchable by humans. And both Solr and Elastic come with some great analytical tools.. Kibana on Elastic, and Banana, the port of Kibana for Solr based engines.

But again, is that big data Or just lots of it?

I’d vote the latter.

 

May 03, 2018

Lucidworks expands focus in new funding round

Lucidworks, the commercial organization with the largest pool of Solr committers, announced today a new funding round of $50M US from venture firms Top Tier Capital Partners and Silver Lake Waterman, as well as additional participation from existing investors Shasta Ventures, Granite Ventures, and Allegis Capital.

While a big funding round for a privately held company isn't uncommon here in 'the valley', what really caught my attention is where and how Lucidworks will use the new capital. Will Hayes, Lucidworks' CEO, intends to focus the investment on what he calls "smart data experiences" that go beyond simply artificial intelligence and machine learning. The challenge is to provide useful and relevant results by addressing what he calls "the last mile" problem in current AI:  enabling mere mortals to find useful insights in search without having to understand the black art of data science and big data analysis. The end target is to drive better customer experiences and improved employee productivity.

A number of well-known companies utilize Lucidworks Fusion already, many along with AI and ML tools and technologies. I've long thought that to take advantage of 'big data' like Google,  Amazon, and others do, you needed huge numbers of users and queries to confidently provide meaningful suggestions in search results.  While that helps, Hayes explained that smaller organizations will be able to benefit from the technology in Fusion because of both smaller and more focused data sets, even with a smaller pool of queries. With the combination of these two characteristics, Lucidworks expects to deliver many of the benefits of traditional machine learning and AI-like results to enterprise-sized content. It will be interesting to see what Lucidworks does in the next several releases of Fusion!

April 23, 2018

Poor Data Quality gives Enterprise Search a Bad Rap

If you’re involved in managing the enterprise search instance at your company, there’s a good chance that you’ve experienced at least some users complaining about the poor results they see. A common lament search teams hear is “Why didn’t we use Google?” Even more telling is that many organizations that used the Google Search Appliance on their sites heard the same lament.

We're often asked to help a client improve results on an internal search platform; and sometimes, the problem is the platform. Not every platform handles every use case equally, and sometimes that shows up. Occasionally, the problem is a poor or misconfigured search, or simply an instance that hasn’t been managed properly. The renowned Google public search engine does well not because it is a great search platform. In fact, Google has become less of a search platform and more of a big data analytics engine.

Our business is helping clients select, implement, and manage Intranet search. Frequently, the problem is not the search platform. Rather, the culprit is poor data quality. 

Enterprise data isn’t created with search in mind. There is little incentive for authors to attach quality metadata in the properties fields Adobe PDF Maker, Microsoft Office, and other document publishing tools support. To make matters worse, there may be several versions of a given document as it goes through creation, editing, and updating; and often the early drafts, as well as the final version, are in the same directory or file share. Very rarely will a public facing website have such issues.

We have an updated two-part series on data quality and search, starting here. We hope you find it helpful; let us know if you have any questions!

February 22, 2018

Search Is the User Experience, not the kernel

In the early days of what we now call 'enterprise search', there was no distinction between the search product and the underlying technology. Verity Topic ran on the Verity kernel and Fulcrum ran on the Fulcrum kernel, and that's the way it was - until recently.

In reality, writing the core of an enterprise search product is tough. It has to efficiently create an index of all the works in virtually any kind of file; it has to provide scalability to index millions of documents; and it has to respect document level security using a variety of protocols. And all of this has to deliver results in well under a second. And now, machine learning is becoming an expected capability as well. All for coding that no user will ever see.

Hosted search vendor Swiftype provides a rich search experience for administrators and for uses, but Elastic was the technology under the covers. And yesterday, Coveo announced that their popular enterprise search product will also be available with the Elastic engine rather than only with the existing Coveo proprietary kernel. This marks the start of a trend that I think may become ubiquitous.  

Lucidworks, for example, is synonymous with Solr; but conceptually there is no reason their Fusion product couldn't run on a different search kernel - even on Elastic. However, with their investment in Solr, that does seem unlikely, especially with their ability to federate results from Elastic and other kernels with their App Studio, part of the recent Twigkit acquisition.

Nonetheless, Enterprise search is not the kernel: it's the capabilities exposed for the operation, management, and search experience of the product.

Of course, there are differences between Elastic and Coveo, for example, as well as with other kernels. But in reality, as long as the administrative and user experiences get the work done, what technology is doing the work under the covers matters only in a few fringe cases. And ironically, Elastic, like many other platforms, has its own potentially serious fringe conditions. At the UI level, solving those cases on multiple kernels is probably a lot less intense than managing and maintaining a proprietary kernel.

And this may be an opportunity for Coveo: until now, it's been a Cloud and Windows-only platform. This may mark their entry into multiple-platform environments.

February 20, 2018

Search, the Enterprise Orphan

It seems that everywhere I go, I hear how bad enterprise search is. Users, IT staff, and management complain, and eventually organizations decide that replacing their existing vendor is the best solution. I’d wager that companies switch their search platforms more frequently than any other mission-critical application

While the situation is frustrating for organizations that use search, the current state isn’t as bad for the actual search vendors: if prospects are universally unhappy with a competing product, it’s easier to sell a replacement technology that promises to be everything the current platform is not. It may seem that the only loser is the current vendor; and they are often too busy converting new customers to the platform to worry much.

But in fact, switching search vendors every few years is a real problem for the organization that simply wants its employees and users to find the right content accurately, quickly and without any significant user training. After all, employees are born with the ability to use Google!

 

Higher level story

Why is enterprise search so bad? In my experience, search implemented and managed properly is pretty darned good. As I see it, the problem is that at most organizations, search doesn’t have an owner.  On LinkedIn, a recent search for “vice president database” jobs shows over 1500 results. Searching for “vice president enterprise search”? Zero hits.

This means that search, recognized as mission-critical by senior management, often doesn’t have an owner outside of IT, whose objective is to keep enterprise applications up and running. Search may be one of the few enterprise applications where “up and running” is just not good enough.

Sadly, there is often no “search owner”; no “search quality team”; and likely no budget for measuring and maintaining result quality.

Search Data Quality

We’ve all heard the expression “Garbage In, Garbage Out”. What is data quality when it comes to search? And how can you measure it?

Ironically, enterprise content authors have an easy way to impact search data quality; but few use it. The trick? Document Properties – also known as ‘metadata’.

When you create any document, there is always data about the document – metadata. Some of the metadata ‘just happens’: the file date, its size, and the file name and path. Other metadata depends on the author-provided properties like a title, subject, and other fielded data like that maintained in the Office ‘Properties’ tab. And there are tools like the Stanford Named Entity Recognition tool (licensed under the GNU General Public License) that can perform advanced metadata extraction from the full text of a document

Some document properties happen automatically. In Microsoft Office, for example, the Properties form provides a way to define field values including the author name, company and other fields. The problem is, few people go to the effort of filling the property fields correctly, so you end up for bad metadata. And bad data is arguably worse than no metadata.

On the enterprise side, I heard about an organization that wanted to reward employees who authored popular content for the intranet. The theory was that recognizing and rewarding useful content creation would help improve the overall quality and utility of the corporate intranet.

An organization we did a project for a few years ago were curious about poor metadata in their intranet document repository, so they did a test. After some testing of their Microsoft Office documents, , they discovered that one employee had authored nearly half of all their intranet content! It turned out that one employee, an Office Assistant, had authored the document that everyone in the origination used as the starting point for a of their common standard reports.

Solving the Problem

Enterprise search technology has advanced to an amazing level. A number of search vendors have even integrated machine learning tools like Spark to surface popular content for frequent queries. And search-related reporting has become a standard part of nearly all search product offerings, so metrics such as top queries and zero hits are available and increasingly actionable.

To really take advantage of these new technological solution, you need to have a team of folks to actively participate in making your enterprise search a success so you can break the loop of “buy-replace”.

Start by identifying an executive owner, and then pull together a team of co-conspirators who can help. Sometimes just by looking at the reports you have and taking action can go a long way.

Review the queries with no results and see if there are synonyms that can find the right content without even changing the content.  Identify the right page for your most popular queries and define one or two “best bets’. If you find that some frequent queries don’t really have relevant content? Work with your web team to create appropriate content.

Funding? Find the right person in your organization to convince that spending a little money on fixing the problems now will break the “buy-replace’ problem and save some significant but needlessly recurring expenses.

Like so many things, a little ongoing effort can solve the problem.

June 28, 2017

Poor data quality gives search a bad rap

If you’re involved in managing the enterprise search instance at your company, there’s a good chance that you’ve experienced at least some users complain about the poor results they see. 

The common lament search teams hear is “Why didn’t we use Google?” when in fact, sites that implemented the GSA but don’t utilize the Google logo and look, we’ve seen the same complaints.

We're often asked to come in and recommend a solution. Sometimes the problem is simply using the wrong search platform: not every platform handles every user case and requirement equally well. Occasionally, the problem is a poorly or misconfigured search, or simply an instance that hasn’t been managed properly. Even the renowned Google public search engine doesn’t happen by itself, but even that is a poor example: in recent years, the Google search has become less of a search platform and more of a big data analytics engine.

Over the years, we’ve been helping clients select, implement, and manage Intranet search. In my opinion, the problem with search is elsewhere: Poor data quality. 

Enterprise data isn’t created with search in mind. There is little incentive for content authors to attach quality metadata in the properties fields of Adobe PDF Maker, Microsoft Office, and other document publishing tools. To make matters worse, there may be several versions of a given document as it goes through creation, editing, reviews, and updates. And often the early drafts, as well as the final version, are in the same directory or file share. Very rarely does a public facing web site content have such issues.

Sometimes content management systems make it easy to implement what is really ‘search engine optimization’ or SEO; but it seems all too often that the optimization is left to the enterprise search platform to work out.

We have an updated two-part series on data quality and search, starting here. We hope you find it helpful; let us know if you have any questions!

June 22, 2017

First Impressions on the new Forrester Wave

The new Forrester Wave™: Cognitive Search And Knowledge Discovery Solutions is out, and once again I think Forrester, along with Gartner and others, miss the mark on the real enterprise search market. 

In the belief that sharing my quick first impression will at least start a conversation going until I can write up a more complete analysis, I am going to share these first thoughts.

First, I am not wild about the new buzzterms 'cognitive search' and "insight engines". Yes, enterprise search can be intelligent, but it's not cognitive. which Webster defines as "of, relating to, or involving conscious mental activities (such as thinking, understanding, learning, and remembering)". HAL 9000 was cognitive software; "Did you mean" and "You might also like" are not cognition.  And enterprise search has always provided insights into content, so why the new 'insight engines'? 

Moving on, I agree with Forrester that Attivio, Coveo and Sinequa are among the leaders. Honestly, I wish Coveo was fully multi-platform, but they do have an outstanding cloud offering that in my mind addresses much of the issue.

However, unlike Forrester, I believe Lucidworks Fusion belongs right up there with the leaders. Fusion starts with a strong open source Solr-based core; an integrated administrative UI; a great search UI builder (with the recent acquisition of Twigkit); and multiple-platform support. (Yep, I worked there a few years ago, but well before the current product was created).

I count IDOL in with the 'Old Guard' along with Endeca, Vivisimo (‘Watson’) and perhaps others - former leaders still available, but offered by non-search companies, or removed from traditional enterprise search (Watson). And it will be interesting to see if Idol and its new parent, Microfocus, survive the recent shotgun wedding. 

Tier 2, great search but not quite “full” enterprise search, includes Elastic (which I believe is in the enviable position as *the* platform for IoT), Mark Logic, and perhaps one or two more.

And there are several newer or perhaps less-well known search offerings like Algolia, Funnelback, Swiftype, Yippy and more. Don’t hold their size and/or youth against them; they’re quite good products.

No, I’d say the Forrester report is limited, and honestly a bit out of touch with the real enterprise search market. I know, I know; How do I really feel? Stay tuned, I've got more to say coming soon. What do you think? Leave a comment below!

November 16, 2016

What features do your search users really want?

What features and capabilities do corporate end-users need from their search platform? Here's a radical concept: ask stakeholders what they want- and what they need - and making a list. No surprise: you'll have too much to do.

Try this: meet with stakeholders from each functional area of the organization. During each interview, ask people to tell you what internet search sites they use for personal browsing, and what capabilities of those sites they like best. As they name the desired features, write them on a white board.

Repeat this with representatives from every department, whether marketing, IT, support, documentation, sales, finance, shipping or others - really every group that will use the platform for a substantial part of their days. 

Once you have the list, ask for a little more help. Tell your users they each have $100 "Dev Dollars" to invest in new features, and ask them to spend whatever portion they want to pay for each feature - but all they have is $100 DD.

Now the dynamics get interesting. The really important features get the big bucks; the outliers get a pittance -  if anything. Typically, the top two or three features requested get between 40DD and 50DD; and that quickly trails off. 

I know - it sounds odd. These Dev Dollars have no true value - but people give a great deal of thought to assigning relative value to a list of capabilities - and it gives you a feature list with real priorities.

How do you discover what users really want? 

 

 

May 31, 2016

The Findwise Enterprise Search and Findability Survey 2016 is open for business

Would you find it helpful to benchmark your Enterprise Search operations against hundreds of corporations, organizations and government agencies worldwide? Before you answer, would you find that information useful enough that you’re spend a few minutes answering a survey about your enterprise search practices? It seems like a pretty good deal to me to have real-world data from people just like yourself worldwide.

This survey, the results of which are useful, insightful, and actionable for search managers everywhere, provides the insight into many of the critical areas of search.

Findwise, the Swedish company with offices there and in Denmark, Norway Poland, Norway and London, is gathering data now for the 2016 version of their annual Enterprise Search and Findability Survey at http://bit.ly/1sY9qiE.

What sorts of things will you learn?

Past surveys give insight into the difference between companies will happy search users versus those whose employees prefer to avoid using internal search. One particularly interesting finding last year was that there are three levels of ‘search maturity’, identifiable by how search is implemented across content.

The least mature search organizations, roughly 25% of respondents, have search for specific repositories (siloes), but they generally treat search as ‘fire and forget’, and once installed, there is no ongoing oversight.

More mature search organizations that represent about 60% of respondents, have one search for all silos; but maintaining and improving search technology has very little staff attention.

The remaining 15% of organizations answering the survey invest in search technology and staff, and continuously attempt to improve search and findability. These organizations often have multiple search instances tailored for specific users and repositories.

One of my favorite findings a few years back was that a majority of enterprises have “one or less” full time staff responsible for search; and yet a similar majority of employees reported that search just didn’t work. The good news? Subsequent surveys have shown that staffing search with as few as 2 FTEs improves overall search satisfactions; and 3 FTEs seem to strongly improve overall satisfaction. And even more good news: Over the years, the trend in enterprise search shows that more and more organizations are taking search and findability seriously.

You can participate in the 2016 Findwise Enterprise Search and Findability Survey in just 10 or 15 minutes and you’ll be among the first to know what this year brings. Again, you’ll find the 2016 survey at http://bit.ly/1sY9qiE.

November 05, 2014

Search Owner's Dilemma

In my session today at Enterprise Search & Discovery, I finished up with a rendition of the old rhyme called "The Engineer's Dilemma", updated for the folks who manage enterprise search in large organizations. Folks seemed to like it, so I'll share it here for those who were unable to be at the conference. I call it "The Search Manger's Dilemma"

It's not my job to pick our search
The call's not up to me.
It's not my place to say how much
The cost of search should be.
It's not my place to tune the thing, not even do it well,
But let the damn thing miss a page And see who catches hell!

Enjoy!