June 07, 2016

Which search is best?

Ask that question to a number of knowledgeable enterprise search consultants, and you’ll no doubt hear a number of answers including Attivio, Elasticsearch, Google, Lucene, Lucidworks, SharePoint, Solr and many others. All are well known, and include rich capabilities and strong technology underpinnings. And the experts you spoke with will have answered honestly.

What you would experience is not unlike the parable about six blind men describing an elephant. In the John Godfrey Saxe telling of the tale, he says:

And so these men of Hindustan
Disputed loud and long,
Each in his own opinion
Exceeding stiff and strong
Though each was partly in the right
And all were in the wrong...

So, now you may be wondering which search engine really is the right one.

The answer is really very easy: the one that meets your needs in your environment. But knowing that may be hard, because there is a good chance it’s been a while you really looked at your current environment.

I’d suggest you break down the process into a few distinct tasks in a process we call a Search Audit.

The Audit

A search audit is very similar to the process we recommend our customers use when selecting a new search platform: you go through the same process and look at the same issues of environment, requirements and more.

Not unlike a financial audit, a search audit is an objective examination and evaluation of your enterprise search implementation. The objective is to review the important metrics to determine how well your search is performing; to identify potential weaknesses; and to come out of the audit with a plan to fix any issues found.

At a high level, the audit is a review of the current environment; repositories; access security; and user requirements. Let’s look at what each of these includes.

Operating System

The operating system you use in an organization is often determined from ‘on high’. When considering a new search platform, it’s critical to verify support for operating systems you use; in an audit, you also need to confirm that your search platform is supported on the operating system version as well as on any anticipated updates.

Systems

Whether you use physical or virtual systems is a big item to review, as well as whether your search platform is software or an appliance. It also may matter whether your servers are on-prem, in the cloud, or a hybrid of both. For example, your security review may need more attention if you use a cloud or hybrid solution; and performance should be reviewed for virtual and remote servers.

Development Tools

Very few search platforms include every feature or capability you need. The solution may be as easy as scripting some common functions to customizing or modifying front-ends.

In the audit, pay attention to the platform and scripting languages, and make sure you have those skills in-house.

Repositories

In your audit, you clearly are looking at indexing content; but it never hurts to review the repositories where your critical content lives. Confirm that any version updates have not impacted the search platform or its performance. And verify that any anticipated platform changes are supported by your search solution, and plan accordingly.

Security

As with your repositories, an audit should confirm that any changes in the security infrastructure are mapped into, and supported by, the search platform. Are there new security levels or groups? Do queries against the repositories include content with the new security mappings?

Content

Servers occasionally get major updates. Use the search audit as an opportunity to anticipate upcoming operating system changes in order to properly confirm compatibility with your existing system. A while back I spoke with a large company using Verity K2 – which has been obsolete for years. They were about to update their Windows NT servers to Server 2012 and wanted to know how they could port K2. Good thing they asked.

Users

Your search platform exists to serve your customers, whether they are internal or external. Google and various eCommerce sites on the web have defined what users expect from search. Most enterprise search software ‘out of the box’ doesn’t look, feel, or work like Google; and you’ll have a problem if you don’t solve the expectations. Ironically, even the popular Google Search Appliance doesn’t generally work like Google.

If you do not already have one, create a search center of excellence, and recruit representative users to help define how your search works.

When it boils down to it just about any search engine can work ‘like Google', but that takes time and effort. If you haven’t already done so, use the audit as the driving force to improve the search experience.

Next Steps

Once you’ve completed your audit, you may find no major problems; and decide that your current search is doing pretty well. If that’s the case, you are in good shape. Other than ongoing maintenance, your task is complete for another year or two.

More often than not, issues come up in search audits. Sometimes it involves content not being indexed or poor search result quality. It may also be that the user experience is not “just like Google”.

The good news is that a majority of these issues can usually be fixed without replacing the platform.

What is the bad news? More often than not, people are so frustrated with search that a decree has come from on high calling to replace the search platform. This usually results in great effort, significant disruption and expense, and a new platform rollout with great flare and unrealistic expectations. But unless the issues you discover in an audit are addressed, there’s a good chance that you’ll be replacing the ‘new’ platform within a few years anyway.

May 31, 2016

The Findwise Enterprise Search and Findability Survey 2016 is open for business

Would you find it helpful to benchmark your Enterprise Search operations against hundreds of corporations, organizations and government agencies worldwide? Before you answer, would you find that information useful enough that you’re spend a few minutes answering a survey about your enterprise search practices? It seems like a pretty good deal to me to have real-world data from people just like yourself worldwide.

This survey, the results of which are useful, insightful, and actionable for search managers everywhere, provides the insight into many of the critical areas of search.

Findwise, the Swedish company with offices there and in Denmark, Norway Poland, Norway and London, is gathering data now for the 2016 version of their annual Enterprise Search and Findability Survey at http://bit.ly/1sY9qiE.

What sorts of things will you learn?

Past surveys give insight into the difference between companies will happy search users versus those whose employees prefer to avoid using internal search. One particularly interesting finding last year was that there are three levels of ‘search maturity’, identifiable by how search is implemented across content.

The least mature search organizations, roughly 25% of respondents, have search for specific repositories (siloes), but they generally treat search as ‘fire and forget’, and once installed, there is no ongoing oversight.

More mature search organizations that represent about 60% of respondents, have one search for all silos; but maintaining and improving search technology has very little staff attention.

The remaining 15% of organizations answering the survey invest in search technology and staff, and continuously attempt to improve search and findability. These organizations often have multiple search instances tailored for specific users and repositories.

One of my favorite findings a few years back was that a majority of enterprises have “one or less” full time staff responsible for search; and yet a similar majority of employees reported that search just didn’t work. The good news? Subsequent surveys have shown that staffing search with as few as 2 FTEs improves overall search satisfactions; and 3 FTEs seem to strongly improve overall satisfaction. And even more good news: Over the years, the trend in enterprise search shows that more and more organizations are taking search and findability seriously.

You can participate in the 2016 Findwise Enterprise Search and Findability Survey in just 10 or 15 minutes and you’ll be among the first to know what this year brings. Again, you’ll find the 2016 survey at http://bit.ly/1sY9qiE.

April 07, 2016

One search to rule them all

(Originally published on LinkedIn)

Lucene was ‘born’ in 1999, created by Doug Cutting; and in 2005, it became a top-level Apache project. That year, Gartner Group announced that the search ‘Leaders’ platforms on their Enterprise Search Magic Quadrant included Autonomy, FAST, Endeca, IBM Omnifind, and Verity. The Google Search Appliance was right on the cusp between ‘Challengers’ and ‘Leaders’. Not many people knew about Lucene; and few who did saw it as much more than a quirky little project.

Just a year later, Yonik Seeley and his employer, CNET Networks, published and donated the Solr search server to the Apache Software Foundation, where it became an incubator project in 2006; the two projects soon merged into a single top-level Apache project. That same year, Gartner narrowed the ‘Leaders’ in their 2006 Magic Quadrant for Search to Autonomy (which acquired Verity the previous year), FAST, and Endeca.

Jump forward to the present. FAST is gone, acquired by Microsoft in 2008 and morphed into SharePoint Search. Hewlett-Packard acquired Autonomy in October of 2011, followed a few weeks later by Oracle’s acquisition of Endeca. Endeca is no longer available as a search platform; and Autonomy is mostly seen as a strategy to keep a large number of HP consultants fully employed, often on compliance applications.

Only a spattering of commercial enterprise search platforms that once flooded the market just a few years back exist any more. While Gartner continues to list 14 or 15 products in their Magic Quadrant Enterprise Search grid, about the only pure commercial products we see any more are the Google Search Appliance and Recommind. And Google recently announced that the appliance is scheduled to go ‘end of life’ over the next few years. All of those bright yellow boxes become really nice Dell servers by the end of 2018.

A new crop of search platforms has grown to fill the void.

As an open source product, Solr has grown in its capabilities, and is now widely used for enterprise search and data applications in corporations and government projects. Solr Cloud extends the platform to a scalable high-availability platform for demanding enterprise and data search applications. Solr is an open source solution.

Cloudera also bundles some interesting extra tools including Solr in their HUE bundle; free to download and free to use as long as you like. Cloudera runs a slightly older but stable release, 4.10; but with a committers Yonik Seeley and Mark Miller, I suspect they’re in a good position.

Hortonworks, a Cloudera competitor, also offers Solr/Solr Cloud in their releases, in partnership with Lucidworks - a company with a large number of committers on staff.

There are also three companies that have proprietary offerings based on open source technology.

Attivio, founded in 2007, is a “Leader” in the most recent Gartner Magic Quadrant for Enterprise Search. Their product, while not open source, nonetheless thrives by combining search, BI, data automation, analytics and more.

Elasticsearch has evolved into a strong platform for search and data analytics, and a number of organizations are finding it useful in some tradition enterprise search applications as well. Elastic has also integrated Kibana, a powerful graphical presentation tool that adds value for content analytics, not just search activity reporting.

Lucidworks Fusion is a relative newcomer to enterprise search. It includes many of the rich architectural features that enterprises expect, including a powerful crawler, connectors, and reporting. With its ‘Anda’ crawler and connectors, admin UI, and reporting, some people see it as a contender to replace the Google Search Appliance.

The one thing that all of these ‘proprietary’ products have in common? They are based on Apache Lucene to deliver critical functionality. And when you consider all of the web sites that use some form of Lucene for their site search, I think you'd agree that it really is a powerful little package. It’s available for virtually any operating systems, and can be integrated using just about any programming language from C/C++ to Java to Perl to Python to .NET.

Even more amazing is that these companies with commercial products based on Lucene – and who compete in the marketplace - actually cooperate when it comes time to fix bugs or add new capabilities to Lucene. Given all of the commercial players that have closed their doors - leaving their customers to find replacement platforms – we’ve reached the point where open-source-based software really is the safe choice now. And universally, Lucene is the common element.

The quirky little search API Doug Cutting put together in 1999 has evolved to be the platform that drives the leading search platforms used in big data, NoSQL, enterprise search, and search analytics. And it doesn’t seem like it’s going to be phasing out any time soon.

January 20, 2015

Your enterprise search is like your teenager

During a seminar a while back, I made this spontaneous claim. Recently, I made the comment again, and decided to back up my claim - which I’ll do here.

No, really – it’s true. Consider:

You can give your search platform detailed instructions, but it may or may not do things the way you meant:

Modern search platforms provide a console where you, as the one responsible for search, can enter all of the information needed to index content and serve up results. You tell it what repositories to index; what security applies to the various repositories; and how you want the results to look.  But did it? Does it give you a full report of what it did, what it was unable to do, and why?

You really have no idea what it’s doing – especially on weekends:

 Search platforms are notorious for the lack of operational information they provide.

Does your platform give you a useful report of what content was indexed successfully, and which were not – and why? And some platforms stop indexing files when they reach a certain size: do you know what content was not completely indexed?

When it does tell you, sometimes the information is incomplete: 

Your crawler tells you there were a bunch of ‘404’ errors because of a bad or missing URL; but will it tell you which page(s) had the bad link? Chances are it does not. 

They can be moody, and malfunction without any notice:

You schedule a full update of you index every weekend, and it has always worked flawlessly – as far as you know. Then, usually on a 3-day weekend, it fails. Why? See above.

When you talk to others who have search, theirs always sounds much better than yours:

As a conscientious search manager, you read about search, you attend webinars and conferences, and you always want to learn more. But you wonder why other search mangers seem to describe their platform in glowing terms, and never seem to have any of the behavioral issues you live with every day. It kind of makes you wonder what you’re doing wrong with yours.

It costs more to maintain than you thought and it always needs updates:

When you first got the platform you knew there we ongoing expenses you’d have to budget – support, training, updates, consulting. But just like your kid who needs books, a computer, soccer coaching, and tuition, it’s always more than you budgeted. Sometimes way more!

You can buy insurance, but it never seems to cover what you really need:

Bear with me here: you get insurance for your kids in case they get sick or cause an accident, and you buy support and maintenance for your search platform.  But in the same way that you end up surprised that orthodontics are not fully covered, you may find out that help tuning the search platform, or making it work better, isn’t covered by the plan you purchased – in fact, it wasn’t even offered. QED.

It speaks a different vocabulary:

You want to talk with your kid and understand what’s going on; you certainly don’t want to look uncool. But like your kid, your search platform has a vocabulary that only barely makes sense to you. You know rows and columns, and thought you understood ‘fields’; but the search platform uses words you know but that don’t seem to be the same definition you’ve known from databases or CMS systems.

It's hard for one person to manage, especially when it's new:

Many surveys show that most companies have one (or less) full-time staff responsible for running the search engine – while the same companies claim search is ‘critical’ to their mission.  Search is hard to run, especially in the first few years when everything needs attention. You can always get outside help – not unlike day care and babysitters – but it just seems so much better if you could have a team to help manage and maintain search to make it behave better.

How it behaves reflects on you:

You’re the search manager and you’ve got the job to make search work “just like Google”.  You spent more than $250K to get this search engine, and the fact that it just doesn’t work well reflects badly on you and your career. You may be worried about a divorce.

It doesn’t behave like the last one:

People tend to be nostalgic, as are many search managers I know. They learned how to take care of the previous one, but this new one – well, it’s NOTHING like the earlier one. You need to learn its habits and behaviors, and often adjust your behavior to insure peace at work.

You know if it messes up badly late at night, even on a weekend or a holiday, you’ll hear about it:

If customers or employees around the world use your search platform, there is no ‘down time’: when it’s having an issue, you’ll hear about it, and will be expected to solve the issue – NOW. You may even have IT staff monitoring the platform; but when it breaks in some odd and unanticipated way, you get the call. (And when does search ever fail in an expected way?)

 You may be legally responsible if it messes up:

Depending on what your search application is used for, you may find yourself legally responsible for a problem. Fortunately, the chances of you personally being at fault are slim, but if your company takes a hit for a problem that you hadn’t anticipated, you may have some ‘career risk’ of your own. Was secure content about the upcoming merger accidentally made public? Was content to be served only to your Swiss employees when they search from Switzerland exposed outside of the country? And you can’t even buy liability insurance for that kind of error.

When it’s good, you rarely hear about it; when it's bad, you’ll hear about it:

Seriously, how many of you have gotten a call from your CIO to tell you what a great experience he or she had on the new search platform? Do people want to take you to lunch because search works so well? If you answered ‘yes’ to either of these, I’d like to hear from you!

In my experience, people only go out of their way to give feedback on search when it’s not working well. It’s not “like Google”. Even though Google has hundreds or people and ‘bots’ examining every search query to try to make the result better, and you have only yourself and an IT guy.

You’ll hear. 

The work of managing it is never done:

The wonderful southern writer Ferrol Sams wrote :

“He's a good boy… I just can't think of enough things to tell him not to do.” Sound like your search platform? It will misbehave (or fail outright) in ways you never considered, and your search vendor will tell you “We’ve never seen a problem like that before”. Who has to get it fixed? You have to ask?

Once it moves away, you sometimes feel nostalgic:

Either you toss it out, or a major upgrade from your vendor comes alone and the old search platform gets replaced. Soon, you’re wishing for the “Good old days” when you knew how cute and quirky the old one was, and you find yourself feeling nostalgic for it and wishing that it didn’t have to move out.

Do you agree with my premise? What  have I missed?

November 25, 2014

The Search Whisperer

A few years ago, Toyota ran an ad in the San Francisco Bay Area featuring the then recently retired Steve Young of 49ers fame. In the advert, he is chatting with a woman at a party and the woman asks, "What do you do for a living Steve?" Rather than answer directly, Young replies with a question: "Do you follow sports? Football?” When the woman answers that she doesn't, Young's (truthful) reply? "I'm a lawyer."

I'm not a lawyer, so when people ask me what I do, I have trouble answering in a precise way. Usually I’ll say “I help companies get the right search platform for their requirements” or "I help companies tune their search platform”. Both are true, but people outside of the narrow field either have no idea what I do, or they think I do SEO for pubic web sites. I also add "I fly small airplanes" and that's usually where the conversation goes. 

It's tough being a "search whisperer", because even people with an enterprise search problem often don't realize that anyone but the vendor can help fix it. Even worse, they don't know that you can avoid problems in the first place by picking a platform that meets the need, then managing it relentlessly. 

Now that the commercial market is down to a handful of players, it may get easier because most of the newer platforms don’t have the rich feature sets that seemed so promising in the old expensive brands. It’s easier to justify a bare bones platform and hope then to buy yet another expensive platform that may or may not work as expected.

Do you have an obsolete platform (FAST ESP, Verity, Vivisimo, Endeca,...) and want to make it work better? Are you going to go with a powerful search API with no real management capabilities? Do you want to future-proof your search? That’s what I do.