February 22, 2018

Search Is the User Experience, not the kernel

In the early days of what we now call 'enterprise search', there was no distinction between the search product and the underlying technology. Verity Topic ran on the Verity kernel and Fulcrum ran on the Fulcrum kernel, and that's the way it was - until recently.

In reality, writing the core of an enterprise search product is tough. It has to efficiently create an index of all the works in virtually any kind of file; it has to provide scalability to index millions of documents; and it has to respect document level security using a variety of protocols. And all of this has to deliver results in well under a second. And now, machine learning is becoming an expected capability as well. All for coding that no user will ever see.

Hosted search vendor Swiftype provides a rich search experience for administrators and for uses, but Elastic was the technology under the covers. And yesterday, Coveo announced that their popular enterprise search product will also be available with the Elastic engine rather than only with the existing Coveo proprietary kernel. This marks the start of a trend that I think may become ubiquitous.  

Lucidworks, for example, is synonymous with Solr; but conceptually there is no reason their Fusion product couldn't run on a different search kernel - even on Elastic. However, with their investment in Solr, that does seem unlikely, especially with their ability to federate results from Elastic and other kernels with their App Studio, part of the recent Twigkit acquisition.

Nonetheless, Enterprise search is not the kernel: it's the capabilities exposed for the operation, management, and search experience of the product.

Of course, there are differences between Elastic and Coveo, for example, as well as with other kernels. But in reality, as long as the administrative and user experiences get the work done, what technology is doing the work under the covers matters only in a few fringe cases. And ironically, Elastic, like many other platforms, has its own potentially serious fringe conditions. At the UI level, solving those cases on multiple kernels is probably a lot less intense than managing and maintaining a proprietary kernel.

And this may be an opportunity for Coveo: until now, it's been a Cloud and Windows-only platform. This may mark their entry into multiple-platform environments.

February 20, 2018

Search, the Enterprise Orphan

It seems that everywhere I go, I hear how bad enterprise search is. Users, IT staff, and management complain, and eventually organizations decide that replacing their existing vendor is the best solution. I’d wager that companies switch their search platforms more frequently than any other mission-critical application

While the situation is frustrating for organizations that use search, the current state isn’t as bad for the actual search vendors: if prospects are universally unhappy with a competing product, it’s easier to sell a replacement technology that promises to be everything the current platform is not. It may seem that the only loser is the current vendor; and they are often too busy converting new customers to the platform to worry much.

But in fact, switching search vendors every few years is a real problem for the organization that simply wants its employees and users to find the right content accurately, quickly and without any significant user training. After all, employees are born with the ability to use Google!

 

Higher level story

Why is enterprise search so bad? In my experience, search implemented and managed properly is pretty darned good. As I see it, the problem is that at most organizations, search doesn’t have an owner.  On LinkedIn, a recent search for “vice president database” jobs shows over 1500 results. Searching for “vice president enterprise search”? Zero hits.

This means that search, recognized as mission-critical by senior management, often doesn’t have an owner outside of IT, whose objective is to keep enterprise applications up and running. Search may be one of the few enterprise applications where “up and running” is just not good enough.

Sadly, there is often no “search owner”; no “search quality team”; and likely no budget for measuring and maintaining result quality.

Search Data Quality

We’ve all heard the expression “Garbage In, Garbage Out”. What is data quality when it comes to search? And how can you measure it?

Ironically, enterprise content authors have an easy way to impact search data quality; but few use it. The trick? Document Properties – also known as ‘metadata’.

When you create any document, there is always data about the document – metadata. Some of the metadata ‘just happens’: the file date, its size, and the file name and path. Other metadata depends on the author-provided properties like a title, subject, and other fielded data like that maintained in the Office ‘Properties’ tab. And there are tools like the Stanford Named Entity Recognition tool (licensed under the GNU General Public License) that can perform advanced metadata extraction from the full text of a document

Some document properties happen automatically. In Microsoft Office, for example, the Properties form provides a way to define field values including the author name, company and other fields. The problem is, few people go to the effort of filling the property fields correctly, so you end up for bad metadata. And bad data is arguably worse than no metadata.

On the enterprise side, I heard about an organization that wanted to reward employees who authored popular content for the intranet. The theory was that recognizing and rewarding useful content creation would help improve the overall quality and utility of the corporate intranet.

An organization we did a project for a few years ago were curious about poor metadata in their intranet document repository, so they did a test. After some testing of their Microsoft Office documents, , they discovered that one employee had authored nearly half of all their intranet content! It turned out that one employee, an Office Assistant, had authored the document that everyone in the origination used as the starting point for a of their common standard reports.

Solving the Problem

Enterprise search technology has advanced to an amazing level. A number of search vendors have even integrated machine learning tools like Spark to surface popular content for frequent queries. And search-related reporting has become a standard part of nearly all search product offerings, so metrics such as top queries and zero hits are available and increasingly actionable.

To really take advantage of these new technological solution, you need to have a team of folks to actively participate in making your enterprise search a success so you can break the loop of “buy-replace”.

Start by identifying an executive owner, and then pull together a team of co-conspirators who can help. Sometimes just by looking at the reports you have and taking action can go a long way.

Review the queries with no results and see if there are synonyms that can find the right content without even changing the content.  Identify the right page for your most popular queries and define one or two “best bets’. If you find that some frequent queries don’t really have relevant content? Work with your web team to create appropriate content.

Funding? Find the right person in your organization to convince that spending a little money on fixing the problems now will break the “buy-replace’ problem and save some significant but needlessly recurring expenses.

Like so many things, a little ongoing effort can solve the problem.

December 04, 2017

Search Indices are Not Content Repositories

Recently on Quora, someone asked for help with a corrupt Elasticsearch index. A number of folks responded, all recommending that he simply rebuild the search index and move on.

The bad news turns out that this person didn't have any source documents: he was so impressed with what Elasticsearch did that he had been using it as his primary storage for content. When it crashed, his content was gone. This is not an indictment of Elasticsearch: it can happen to any complex software product whether Elastic, Solr or SharePoint.

In my reply, I told him how sorry I was for his loss, and suggested he get to work restoring or recreating his content. I even offered to call and tell him how sorry I was for his loss. 

Then I launched into what I really felt I needed to say - there for his behalf, and here for yours.  I suggested - no, actually I insisted - that you NEVER use ANY search index as your primary store for content. Let me be more specific: NEVER. EVER.

Some commercial platforms such as Solr and commercial software based on Solr (ie, Lucidworks) have a reasonably robust ability to replicate the index over multiple servers or nodes which provides some safety (I’m thinking SOLR Cloud here); others do not. But the replication is a copy of the INDEX, which is NOT your documents.

The search index is optimized for retrieval. Databases, CMS, file systems and other tech are for storage.

For one, I’m not sure any search engine stores the entire document of any type. Conceptually, most search indices have two ‘logical’ (if not physical) files 

One of these files you can think of as a database table with one row per document, with field values (Title, Author, etc). This file generally stores the URL, file name, database row as well, basically ‘where do I go to find this full document?’ - and maybe a few other field values.

The second file is a list of all the (non-stopwords) in all of your documents. The word itself is stored once, along with a list of byte offsets in the document where the word appears (multiple byte offsets, one for each instance of the word). It also has a pointer to all docs which have that word. Again: Stop words are generally NOT indexed, so they are usually not in the index.

(There is more detail in an older article on my website Relational Databases vs. Full-Text Search Engines - New Idea Engineering)

COULD you rebuild the full document? Well, depends on the search platform. In most platforms I've seen, it would be difficult because stop words are not even stored. Recreating a document that omits ‘the’, ’a’, ‘an’, ‘and’ etc. MIGHT be human readable but it is NOT the original document.

Secondly, not all search engine indices are replicated for redundancy. The assumption is that if you lose the file system where the content lives, you can still search; you just can't retrieve any documents until you restore the original content.

And some platforms do not give you a way to access the index, short of searching. And a search index is an index, not a repository.

Finally, some platforms are better at redundant failover of indices than others. If the platform you use is one of those that do not have redundancy BY DEFAULT.. like some very popular platforms - and you use that index as the primary data store for your doc and the index dies.. you’re what we used to call SOL - ‘sure outta luck’.

The moral of the story? DO NOT USE A SEARCH INDEX AS THE PRIMARY DATASTORE. Specific enough?

October 11, 2017

A Search Center of Excellence is still relevant managing enterprise search

I was having a conversation with an old friend who manages enterprise search her organization, a biotech company back east. We've worked together on search projects going back to my days at Verity - for you young'uns, that is what we call BG: 'before Google'.

Based on an engagement we did sometime after Google but before Solr, "Centers of Excellence" or "COE" had become very popular, and we decided we could define the rules and responsibilities of a  Search Center of Excellence or SCOE: the team that manages the full breadth of operation and management for enterprise search. We began preaching the gospel of the SCOE at trade show events and on our blog where you can find that original article.

My friend and I had a great conversation about how successful they had been managing three generations of search platforms now with the SCOE; and how they still maintain the responsibilities the SCOE assumed years back with only a few meetings a year to review how search is doing, address any concerns, and map out enhancements as they become available. 

It worked then, and it works now. The SCOE is a great idea! Let me know if you'd like to talk about it.

September 28, 2017

Enterprise Search Newsletter: September 2017

 

 

Welcome to the Volume 7 Issue 2 of the Enterprise Search Newsletter

from New Idea Engineering, Inc.

This month we start with What's New which includes an update of the Google Search Appliance saga; the current renaissance in enterprise search; and an extended product line for what I'd argue is an industry leader. We also cover:

Winning Methodologies for Enterprise Search

Like many organizations, you probably have an existing enterprise search solution ­ serving intranet or customer­facing content, perhaps even e­commerce. You had great expectations for the solution, but it hasn't worked out the way you had hoped. Your users are unhappy and complain about not being able to find the information they need. Maybe customers continue to call your support group for answers since they cannot find help on your website. Or your sales remained flat or even dropped after the roll­out of the new search. What can you do? more

Why Is Enterprise Search Difficult?

Companies, government agencies, and other organizations maintain huge amounts of information in electronic form including spreadsheets, policy manuals, and web pages just to mention a few. The content may be stored in file shares, websites, content management systems or databases, but without the ability to find this corporate knowledge, managing even a small company would be difficult. more

The Search Whisperers

Several years ago, Toyota ran an ad in the San Francisco Bay Area featuring the then recently retired Steve Young of San Francisco 49ers fame. In the advert, he is chatting with a woman at a party and the woman asks, "What do you do for a living, Steve?" Rather than answer directly, Young replies with a question: "Do you follow sports? Football?" When the woman answers that she doesn't, Young's (truthful) reply? "I'm a lawyer." more

 

Finally, I'd be remiss if I didn't mention my September 2017 column at CMS Wire or the upcoming Enterprise Search and Discovery conference in DC in November. I hope to see you there!

Feel free to contact me with your questions or suggestions!

 

www.ideaeng.com Copyright 2017: New Idea Engineering

June 28, 2017

Poor data quality gives search a bad rap

If you’re involved in managing the enterprise search instance at your company, there’s a good chance that you’ve experienced at least some users complain about the poor results they see. 

The common lament search teams hear is “Why didn’t we use Google?” when in fact, sites that implemented the GSA but don’t utilize the Google logo and look, we’ve seen the same complaints.

We're often asked to come in and recommend a solution. Sometimes the problem is simply using the wrong search platform: not every platform handles every user case and requirement equally well. Occasionally, the problem is a poorly or misconfigured search, or simply an instance that hasn’t been managed properly. Even the renowned Google public search engine doesn’t happen by itself, but even that is a poor example: in recent years, the Google search has become less of a search platform and more of a big data analytics engine.

Over the years, we’ve been helping clients select, implement, and manage Intranet search. In my opinion, the problem with search is elsewhere: Poor data quality. 

Enterprise data isn’t created with search in mind. There is little incentive for content authors to attach quality metadata in the properties fields of Adobe PDF Maker, Microsoft Office, and other document publishing tools. To make matters worse, there may be several versions of a given document as it goes through creation, editing, reviews, and updates. And often the early drafts, as well as the final version, are in the same directory or file share. Very rarely does a public facing web site content have such issues.

Sometimes content management systems make it easy to implement what is really ‘search engine optimization’ or SEO; but it seems all too often that the optimization is left to the enterprise search platform to work out.

We have an updated two-part series on data quality and search, starting here. We hope you find it helpful; let us know if you have any questions!

June 22, 2017

First Impressions on the new Forrester Wave

The new Forrester Wave™: Cognitive Search And Knowledge Discovery Solutions is out, and once again I think Forrester, along with Gartner and others, miss the mark on the real enterprise search market. 

In the belief that sharing my quick first impression will at least start a conversation going until I can write up a more complete analysis, I am going to share these first thoughts.

First, I am not wild about the new buzzterms 'cognitive search' and "insight engines". Yes, enterprise search can be intelligent, but it's not cognitive. which Webster defines as "of, relating to, or involving conscious mental activities (such as thinking, understanding, learning, and remembering)". HAL 9000 was cognitive software; "Did you mean" and "You might also like" are not cognition.  And enterprise search has always provided insights into content, so why the new 'insight engines'? 

Moving on, I agree with Forrester that Attivio, Coveo and Sinequa are among the leaders. Honestly, I wish Coveo was fully multi-platform, but they do have an outstanding cloud offering that in my mind addresses much of the issue.

However, unlike Forrester, I believe Lucidworks Fusion belongs right up there with the leaders. Fusion starts with a strong open source Solr-based core; an integrated administrative UI; a great search UI builder (with the recent acquisition of Twigkit); and multiple-platform support. (Yep, I worked there a few years ago, but well before the current product was created).

I count IDOL in with the 'Old Guard' along with Endeca, Vivisimo (‘Watson’) and perhaps others - former leaders still available, but offered by non-search companies, or removed from traditional enterprise search (Watson). And it will be interesting to see if Idol and its new parent, Microfocus, survive the recent shotgun wedding. 

Tier 2, great search but not quite “full” enterprise search, includes Elastic (which I believe is in the enviable position as *the* platform for IoT), Mark Logic, and perhaps one or two more.

And there are several newer or perhaps less-well known search offerings like Algolia, Funnelback, Swiftype, Yippy and more. Don’t hold their size and/or youth against them; they’re quite good products.

No, I’d say the Forrester report is limited, and honestly a bit out of touch with the real enterprise search market. I know, I know; How do I really feel? Stay tuned, I've got more to say coming soon. What do you think? Leave a comment below!

January 25, 2017

Lucidworks 3 Released!

Today Lucidworks announced the release Fusion 3, packed with some very powerful capabilities that, in many ways, sets a new standard in functionality and usability for enterprise search.

Fusion is tightly integrated Solr 6, the newest version of the popular, powerful and well-respected open source search platform. But the capabilities that really set Fusion 3 apart are the tools provided by Lucidworks on top of Solr to reduce the time-to-productivity.

It all starts at installation, which features a guided setup to allow staff, who may be not be familiar with enterprise search, to get started quickly and to built quality, full-featured search applications.

Earlier versions of Fusion provided very powerful ‘pipelines’ that allowed users to define a series of custom steps or 'stages' during both indexing and searching. These pipelines allowed users to add custom capabilities, but they generally required some programming and a deep understanding of search.

That knowledge still helps, but Fusion 3 comes with what Lucidworks calls the “Index Workbench” and the “Query Workbench”. These two GUI-driven applications let mere mortals set up capabilities that used to require a developer, and enables developers to create powerful pipelines in much less time.

What can a pipeline do? Let's look at two cases.

On a recent project, our client had a deep, well developed taxonomy, and they wanted to tag each document with the appropriate taxonomy terms. In the Fusion 2.x Index Pipeline, we wrote code to evaluate each document to determine relevant taxonomy terms; and then to insert the appropriate taxonomy terms into the actual document. This meant that at query time, no special effort was required to use the taxonomy terms in the query: they were part of the document.

Another common index time task is to identify and extract key terms, perhaps names and account numbers, to be used as facets.

The Index Workbench in Fusion 3 provides a powerful front-end to these capabilities that have long been part of Fusion; but which are now much easier for mere mortals to use.

The Query Workbench is similar, except that it operates at query time, making it easy to do what we’ve long called “query tuning”. Consider this: not every term a user enters for search is of equal important. The Query Workbench lets a non-programmer tweak relevance using a point-and-click interface. In previous visions of Fusion, and in most search platforms, a developer needed to write code to do the same task.

Another capability in Fusion 3 addresses a problem everyone who has ever installed a search technology has faced: how to insure that the production environment exactly mirrors the dev and QA servers. Doing so was a very detailed and tedious task; and any differences between QA and production could break something.

Fusion 3 has what Lucidworks calls Object Import/Export. This unique capability provides a way to export collection configurations, dashboards, and even pipeline stages and aggregations from a test or QA system; and reliably import those objects to a new production server. This makes it much easier to clone test systems; and more importantly, move search from Dev to QA and into production with high confidence that production exactly matches the test environment.

Fusion 3 also extends the Graphical Administrative User Interface to manage pretty much everything your operations department will need to do with Fusion. Admin UIs are not new; but the Fusion 3 tool sets a new high bar in functionality.

There is one other capability in Fusion 3 enabled by a relatively new capability in Solr: SQL.

I know what you’re thinking: “Why do I want SQL in a full-text application?”

Shift your focus to the other end.

Have you ever wanted to generate a report that shows information about inventory or other content in the search index? Let’s say on your business team needs inventory and product reports on content in your search-driven eCommerce data. The business team has tools they know and love for creating their own reports; but those tools operate on SQL databases.

This kind of reporting has always been tough in search, and typically required some customer programming to create the reports. With the SQL querying capabilities in Solr 6, and security provided by Fusion 3, you may simply need to point your business team at the search index, verify their credentials, and connect via OBDC/JDBC, and their existing tools will work.

What Else?

Fusion 3 is an upgrade from earlier versions, so it includes Spark, an Apache took with built-in modules for streaming, SQL, machine learning and graph processing. It works fine on Solr Cloud, which enables massive indices and query load; noit to mentin failover in the even of hardware problems. 

I expect that Fusion 3 documentation, and the ability to download and evaluate the product, will be on the Lucidworks site today at www.lucidworks.com. “Try it, you’ll like it”.

While we here at New Idea Engineering, a Lucidworks partner, can help you evaluate and implement Fusion 3, I’d also point out that our friends at MC+A, also Lucidworks partners, are hosting a webinar Thursday, January 26th. The link this link to register and attend the webinar: http://bit.ly/2joopQK.

 

Lucidworks CTO Grant Ingersol will be hosting a webinar on Friday, February 1st. Read about it here.

 

/s/ Miles

November 16, 2016

What features do your search users really want?

What features and capabilities do corporate end-users need from their search platform? Here's a radical concept: ask stakeholders what they want- and what they need - and making a list. No surprise: you'll have too much to do.

Try this: meet with stakeholders from each functional area of the organization. During each interview, ask people to tell you what internet search sites they use for personal browsing, and what capabilities of those sites they like best. As they name the desired features, write them on a white board.

Repeat this with representatives from every department, whether marketing, IT, support, documentation, sales, finance, shipping or others - really every group that will use the platform for a substantial part of their days. 

Once you have the list, ask for a little more help. Tell your users they each have $100 "Dev Dollars" to invest in new features, and ask them to spend whatever portion they want to pay for each feature - but all they have is $100 DD.

Now the dynamics get interesting. The really important features get the big bucks; the outliers get a pittance -  if anything. Typically, the top two or three features requested get between 40DD and 50DD; and that quickly trails off. 

I know - it sounds odd. These Dev Dollars have no true value - but people give a great deal of thought to assigning relative value to a list of capabilities - and it gives you a feature list with real priorities.

How do you discover what users really want? 

 

 

November 14, 2016

Which search is best?

Ask that question to a number of knowledgeable enterprise search consultants, and you’ll no doubt hear a number of answers including Attivio, Elasticsearch, Google, Lucene, Lucidworks, SharePoint, Solr and many others. All are well known, and include rich capabilities and strong technology underpinnings. And the experts you spoke with will have answered honestly.

What you would experience is not unlike the parable about six blind men describing an elephant. In the John Godfrey Saxe telling of the tale, he says:

And so these men of Hindustan
Disputed loud and long,
Each in his own opinion
Exceeding stiff and strong
Though each was partly in the right
And all were in the wrong...

So, now you may be wondering which search engine really is the right one for you.

The answer is really very easy: the one that meets your needs in your environment. But knowing that may be hard, because there is a good chance it’s been a while you really looked at your current environment.

I’d suggest you break down the process into a few distinct tasks in a process we call a Search Audit.

The Audit

A search audit is very similar to the process we recommend our customers use when selecting a new search platform: you go through the same process and look at the same issues of environment, requirements and more.

Not unlike a financial audit, a search audit is an objective examination and evaluation of your enterprise search implementation. The objective is to review the important metrics to determine how well your search is performing; to identify potential weaknesses; and to come out of the audit with a plan to fix any issues found.

At a high level, the audit is a review of the current environment; repositories; access security; and user requirements. Let’s look at what each of these includes.

Operating System

The operating system you use in an organization is often determined from ‘on high’. When considering a new search platform, it’s critical to verify support for operating systems you use; in an audit, you also need to confirm that your search platform is supported on the operating system version as well as on any anticipated updates.

Systems

Whether you use physical or virtual systems is a big item to review, as well as whether your search platform is software or an appliance. It also may matter whether your servers are on-prem, in the cloud, or a hybrid of both. For example, your security review may need more attention if you use a cloud or hybrid solution; and performance should be reviewed for virtual and remote servers.

Development Tools

Very few search platforms include every feature or capability you need. The solution may be as easy as scripting some common functions to customizing or modifying front-ends.

In the audit, pay attention to the platform and scripting languages, and make sure you have those skills in-house.

Repositories

In your audit, you clearly are looking at indexing content; but it never hurts to review the repositories where your critical content lives. Confirm that any version updates have not impacted the search platform or its performance. And verify that any anticipated platform changes are supported by your search solution, and plan accordingly.

Security

As with your repositories, an audit should confirm that any changes in the security infrastructure are mapped into, and supported by, the search platform. Are there new security levels or groups? Do queries against the repositories include content with the new security mappings?

Content

Servers occasionally get major updates. Use the search audit as an opportunity to anticipate upcoming operating system changes in order to properly confirm compatibility with your existing system. A while back I spoke with a large company using Verity K2 – which has been obsolete for years. They were about to update their Windows NT servers to Server 2012 and wanted to know how they could port K2. Good thing they asked.

Users

Your search platform exists to serve your customers, whether they are internal or external. Google and various eCommerce sites on the web have defined what users expect from search. Most enterprise search software ‘out of the box’ doesn’t look, feel, or work like Google; and you’ll have a problem if you don’t solve the expectations. Ironically, even the popular Google Search Appliance doesn’t generally work like Google.

If you do not already have one, create a search center of excellence, and recruit representative users to help define how your search works.

When it boils down to it just about any search engine can work ‘like Google', but that takes time and effort. If you haven’t already done so, use the audit as the driving force to improve the search experience.

Next Steps

Once you’ve completed your audit, you may find no major problems; and decide that your current search is doing pretty well. If that’s the case, you are in good shape. Other than ongoing maintenance, your task is complete for another year or two.

More often than not, issues come up in search audits. Sometimes it involves content not being indexed or poor search result quality. It may also be that the user experience is not “just like Google”.

The good news is that a majority of these issues can usually be fixed without replacing the platform.

What is the bad news? More often than not, people are so frustrated with search that a decree has come from on high calling to replace the search platform. This usually results in great effort, significant disruption and expense, and a new platform rollout with great flare and unrealistic expectations. But unless the issues you discover in an audit are addressed, there’s a good chance that you’ll be replacing the ‘new’ platform within a few years anyway.