63 posts categorized "Lucene"

June 28, 2017

Poor data quality gives search a bad rap

If you’re involved in managing the enterprise search instance at your company, there’s a good chance that you’ve experienced at least some users complain about the poor results they see. 

The common lament search teams hear is “Why didn’t we use Google?” when in fact, sites that implemented the GSA but don’t utilize the Google logo and look, we’ve seen the same complaints.

We're often asked to come in and recommend a solution. Sometimes the problem is simply using the wrong search platform: not every platform handles every user case and requirement equally well. Occasionally, the problem is a poorly or misconfigured search, or simply an instance that hasn’t been managed properly. Even the renowned Google public search engine doesn’t happen by itself, but even that is a poor example: in recent years, the Google search has become less of a search platform and more of a big data analytics engine.

Over the years, we’ve been helping clients select, implement, and manage Intranet search. In my opinion, the problem with search is elsewhere: Poor data quality. 

Enterprise data isn’t created with search in mind. There is little incentive for content authors to attach quality metadata in the properties fields of Adobe PDF Maker, Microsoft Office, and other document publishing tools. To make matters worse, there may be several versions of a given document as it goes through creation, editing, reviews, and updates. And often the early drafts, as well as the final version, are in the same directory or file share. Very rarely does a public facing web site content have such issues.

Sometimes content management systems make it easy to implement what is really ‘search engine optimization’ or SEO; but it seems all too often that the optimization is left to the enterprise search platform to work out.

We have an updated two-part series on data quality and search, starting here. We hope you find it helpful; let us know if you have any questions!

May 31, 2016

The Findwise Enterprise Search and Findability Survey 2016 is open for business

Would you find it helpful to benchmark your Enterprise Search operations against hundreds of corporations, organizations and government agencies worldwide? Before you answer, would you find that information useful enough that you’re spend a few minutes answering a survey about your enterprise search practices? It seems like a pretty good deal to me to have real-world data from people just like yourself worldwide.

This survey, the results of which are useful, insightful, and actionable for search managers everywhere, provides the insight into many of the critical areas of search.

Findwise, the Swedish company with offices there and in Denmark, Norway Poland, Norway and London, is gathering data now for the 2016 version of their annual Enterprise Search and Findability Survey at http://bit.ly/1sY9qiE.

What sorts of things will you learn?

Past surveys give insight into the difference between companies will happy search users versus those whose employees prefer to avoid using internal search. One particularly interesting finding last year was that there are three levels of ‘search maturity’, identifiable by how search is implemented across content.

The least mature search organizations, roughly 25% of respondents, have search for specific repositories (siloes), but they generally treat search as ‘fire and forget’, and once installed, there is no ongoing oversight.

More mature search organizations that represent about 60% of respondents, have one search for all silos; but maintaining and improving search technology has very little staff attention.

The remaining 15% of organizations answering the survey invest in search technology and staff, and continuously attempt to improve search and findability. These organizations often have multiple search instances tailored for specific users and repositories.

One of my favorite findings a few years back was that a majority of enterprises have “one or less” full time staff responsible for search; and yet a similar majority of employees reported that search just didn’t work. The good news? Subsequent surveys have shown that staffing search with as few as 2 FTEs improves overall search satisfactions; and 3 FTEs seem to strongly improve overall satisfaction. And even more good news: Over the years, the trend in enterprise search shows that more and more organizations are taking search and findability seriously.

You can participate in the 2016 Findwise Enterprise Search and Findability Survey in just 10 or 15 minutes and you’ll be among the first to know what this year brings. Again, you’ll find the 2016 survey at http://bit.ly/1sY9qiE.

April 23, 2013

Open Source Search Myth 4 - Features and Capabilities Lag

This is part of a series addressing the misconception that open source search is too risky for companies to use. You can find the introduction to the series here; and Part 3, Skills required In-House, here.

Part 4: Features and Capabilities Lag

Keeping up with the latest and greatest technology is important, especially when there is a great deal of innovation in a field. Enterprise search is one such field.

In this post I'll address the claim that "Production functionality may trail in specific features relative to commercial search firms".

First, let me remind you that many of the coolest advanced capabilities in modern search platforms is delivered using third party products integrated into the actual search product. Examples:

Entity extraction: Cool stuff, and part of many search platforms. Often implemented using technology from companies like Basis Technology, Pingar, and others.

Non-English support: Required for any large-scale enterprise. Think Basis Technology again; or pretty darned good open source filters.

Document format support: Leaders here were smaller companies that were eventually purchased by larger search companies: Keyview (not Autonomy); Stellent (now Oracle); ISYS (now IBM). Open source Tika.

Sentiment Analysis: Identify 'positive' versus 'negative' sentiment, using products from Lexalytics, Attensity, SAS, LingPipe and others. 

My point is not that large enterprise search platform companies do not include some cool new technologies in their products: it's just that the 'cool' usually comes from a third party that can be licensed for use in any platform, not just "commercial" ones. 

And, when you use open source platforms, you always have the option of doing a feature yourself - either in-house, or using a consulting firm.

And you might not be aware of capabilities where open source Solr is ahead of many commercial vendors. For example, consider Geo search, which lets you easily search for 'documents' relevant to a particular location.  And it can even be used to answer questions like "what managers are on-duty on Saturday night at the LA store".

I will say that Microsoft, in its SharePoint 2013, has implemented a very nice query boosting tool that, as far as I can tell, was created in-house - I doubt it was in the FAST pipeline at the acquisition. 

But give that caveat, I'd ask, what with all of recent acquisition and mergers, whether any 'enterprise search' company implemented major new capability like pivot facets, entity extraction and more - without licensing the technology from an outside company?

 

 

 

 

 

March 20, 2013

Open Source Search Myth 3: Skills Required In-House

This is part of a series addressing the misconception that open source search is too risky for companies to use. You can find the introduction to the series here; this is Part 3 of the series; for Part 2 click Potentially Expensive Customizations.

Part 3: Skills Required In-House

One of the hallmarks of enterprise software in general is that it is complex. People in large organizations who manage instances of enterprise search as no less likely than their non-technical peers to believe that "if Google can make search so good on the internet, enterprise search must be trivial". Sadly, that is the killer myth of search.

Google on the internet - or Bing or Baidu or whichever site you use and love - is good because of the supporting technology, NOT simply because of search. I'd wager that most of what people like about Google et al has very little to do with search and a great deal to do constant monitoring and tweaking of the platform.

Consider: at the Google 'command line' (the search box), you can type in an arithmetic equation such as "2+3" get 5. You can enter a FedEx tracking number and get a suggestion to link to FedEx for information. It's cool that Google provides those capabilities and others; but those features are there because Google has programs looking at search behavior for all of its users every day in order to understand user intent. When something unusual comes up, humans get involved and make judgments. When it makes sense, Google implements another capability - in front of the search engine, not within it.

Enterprise search is the same - except that very few companies invest money in managing and running their search; so no matter how well you tune it at the beginning, quality deteriorates over time. Enterprise search is not 'fire and forget'.

 Any company that rolls out a mission critical application and does NOT have their own skilled team in house is going to pay a consulting form thousands of dollars a day forever. 'Nuff said.

 

March 18, 2013

Solr 4 Training 3/27 in Northern Virginia/DC area

Interrupting my series on whether open source search is a good idea in the enterprise to tell you about an opportunity to attend LucidWorks' Solr Bootcamp in Reston, Virginia on Wednesday March 27. Lucid staff and Lucene/Solr committers Erick Erickson and Erik Hatcher will be there, along with Solr pro Joel Bernstein. Heck, I'll even be there!

The link is here; for readers of our blog, use discount code SOLR4VA-5OFF for a discount.

Course Outline:

  • What's new in Solr 4
  • Solr 4 Functional Overview
  • Solr Cloud Deep Dive
  • Solr 4 Expert Panel Case Studies
  • Workshop and Open lab

And ask the guys how you can get involved in Solr as a contributor or committer!

 

March 15, 2013

Open Source Search Myth 2: Potentially Expensive Customizations

This is part of a series addressing the misconception that open source search is too risky for companies to use. You can find the introduction to the series here; this is Part 2 of the series; for Part 3 click Skills Required In House.

Part 2: Potentially Expensive Customization

Which is more expensive: open source or proprietary search platforms?

Commercial enterprise search vendors often quote man-years of effort to create and deploy what, in many cases, should be relatively straightforward site search.  Sure, there are tough issues: unusual security; the need to mark-up content as part of indexing; multi-language issues; and vaguely defined user requirements.

Not to single them out, but Autonomy implementations were legend for taking years. Granted, this was usually eDiscovery search, so the sponsor - often a Chief Risk Officer - had no worries about budget. Anything that would keep the CRO and his/her fellow executives out of jail was reasonable. But even with easier tasks such as search-enabling an intranet site, took more time and effort than it needed because no one scoped out the work. This is one reason so many IDOL projects hire large numbers of IDOL contractors for such long projects.

FAST was also famous for lengthy engagements. 

FAST once quoted a company we later worked with a one year $500K project to assist in moving from ESP Version 4.x to ESP Version 5.x. These were two versions that were, for all purposes, the same user interface, the same API, the same command line tools. Really? One year?

True story: I joked with one of the sales guy that FAST even wanted 6 months to roll out a web search for a small intranet; I thought two weeks was more like it. He put me on the spot a year later and challenged me to help one of his customers, and sure enough, we took almost a month to bring up search! But we had a constraint: the new FAST search had to be callable from the existing custom CMS, which had hard-coded calls to Verity K2 - the customer did not have time to re-write the CMS.

Thus, part of our SOW was to write a front-end that would accept search requests using the Verity K2 DLL; intercept the call; and perform the search in FAST ESP. Then, intercepting the K2 results list processing calls, deliver the FAST results to the CMS that thought it was talking with Verity. And we did it in less that 20% of the time FAST wanted to index a generic HTML-bases web site.

On the other hand, at LucidWorks we frequently have 5-day engagements to set up the Solr and LucidWorks Search; index the user's content; and integrate results in the end user application. I think for most engagements, other Solr and open source implementations are comparable. 

Let me ask: which was the more "expensive" implementation?

March 13, 2013

Open Source Search Myth 1 - Enhancements by Committee

This is part of a series addressing the misconception that open source search is too risky for companies to use. You can find the introduction to the series here; this is Part 1 of the series; for Part 2, click Potentially Expensive Customizations.

Part 1: Enhancements Subject to Committee

The original article back on LinkedIn called out one 'flaw' of open source search the belief that updates and improvements were made only on a timetable selected by the community - presumably the committers.  

One of the hallmarks of Apache open code projects is that, when you make a change or make an enhancement to the code, you submit the changes back to the Apache project. 

My employer, LucidWorks, enhances Solr, and we push back changes we make for consideration of the entire Solr community. Many of these changes are accepted and become part of Solr - almost all because of demand/need we see in our commercial customers, which helps everyone. 

Occasionally, a customer has a specific need and asks us to develop capabilities that are not part of the standard release. Sometimes the enhancements are on the Apache project plan; and sometimes they are unique. In any case, we create the enhancement and submit them for consideration in the standard Solr trunk. Once we’ve done so, anyone can download our enhancements and use them. And, as we do at LucidWorks, anyone can write enhancements for themselves and make them available.

Compare this to commercial search vendors who update on their own (typically unpublished) schedule; and no one can add a feature on their own. The vendor decides, and the consumer can only hope. And you pay upward of 20% of the list price every year in anticipation of the change you hope for but cannot add on your own.

And no matter what happens to Solr, our customers have the source code to self-support forever - no involuntary forced conversion. 

March 11, 2013

Open source search engine - a good idea?

My transition to LucidWorks has been a busy one with little time for other interests and hobbies (like flying!). But a week or so back I spotted a November 2012 post on the LinkedIn Enterprise Search Engine Professionals group asking the question in this post's title. 

Of course, LucidWorks provides deep support for Solr, and markets a Solr-based enterprise search product; but it's my work with enterprise search technology for the last 20+ years that really drives my response. Sadly, my reply was longer than LinkedIn allows.. so I posted a shorter link there and have come back here to reply in full. It's going to take a few posts though, so bear with me if you will. 

First, my response to the poster: Eight years ago open source was cool, but was probably not 'enterprise ready'.  Enterprise search is hard, but years ago the Apache projects (Lucene and Solr) began working to solve the tough issues - ones that were not commercially worth it for the 8 to 10 major commercial enterprise search companies.

Then a funny thing happened: Solr got better and better; and the commercial vendors started merging. Verity got sucked into Autonomy, which got sucked into HP. FAST got sucked into Microsoft. Vivisimo got sucked into IBM. And with every acquisition, the time and money that enterprises had invested in commercial search became totally wasted - when the platform you based your search on got acquired, you had to move to the new engine. A painful, expensive and long process,

As I blogged just a few weeks ago, open source search is now the default SAFE choice for enterprises that need search. You may have to do some coding, or find a skilled expert/team to help; but you own your destiny. Lucid (my company) does sell support for Solr; there are other fine companies, large and small, that do so. We're fortunate enough to employ a good number of the committers - no majority, which is probably best for the community. 

The original poster, an employee of a proprietary search vendor, may have had his reasons. Nonetheless, he listed five reasons he felt that open source search for enterprises was a bad idea - based on a three year old report by my friend Hadley Reynolds - taken a bit out of context. These 'disadvantages' are listed and linked below.  

* Enhancements on community timetable only

* Potentially expensive customization

* Requirement for search development skills in-house or ready-to-hand

* Production functionality may trail in specific features relative to commercial search firms

* Maintenance/system life costs can become significant

In the next several posts, I'm going to address and refute these one at a time. Stay tuned.

 

February 14, 2013

A paradigm shift in enterprise search

I've been involved in enterprise search since before the 'earthquake World Series' between the Giants and the A's in 1989. While our former company became part of LucidWorks last December, we still keep abreast of the market. But being a LucidWorks employee has brought me to a new realization: commercial enterprise search is pretty much dead.

Think back a few years: FAST ESP, Autonomy IDOL (including the then-recently acquired Verity), Exalead, and Endeca were the market. Now, every one of those companies has become part of a larger business. Some of the FAST technology lives on, buried in SharePoint 2013; Autonomy has suffered as part of HP because - well, because HP isn't what it was when Bill and Dave ran it. Current management doesn't know what they have in IDOL, and the awful deal they cut was probably based on optimistic sales numbers that may or may not have existed. Exalead, the engine I hoped would take the place of FAST ESP in the search market is now part of Dassault and is rarely heard of in search. And Endeca, the gem of a search platform optimized for the lucrative eCommerce market, has become one of three or four search-related companies in the Oracle stable. 

Microsoft is finally taking advantage of the technology acquired in the FAST acquisition for SharePoint 2013, but as long as it's tied to SharePoint - even with the ability to index external content - it's not going to be an enterprise-wide distribution - or a 'big data' solution. SharePoint Hadoop? Aslongf as you bring SQL Server. Mahout? Pig? I don't think so. There are too many companies that want or need Linux for their servers rather than Windows.

Then there is Google, the ultimate closed-box solution. As long as you use the Google search button/icon, users are happy – at least at first. If you have sixty guys named Sarah? Maybe not.

So what do we have? A few good options generally from small companies that tend to focus on hosted eCommerce - SLI Systems and Dieselpoint; and there’s Coveo, a strong Windows platform offering.

Solr is the enterprise search market now. My employer, LucidWorks, was the first, and remains the primary commercial driver to the open source Apache project. What's interesting is the number of commercial products based on Solr and it's underlying platform, Lucene.

Years ago, commercial search software was the 'safe choice'. Now I think things have changed: open source search is the safe choice for companies where search is mission. Do you agree?

I'll be writing more about why I believe this to be the case over the coming weeks and months: stay tuned.

/s/Miles

 

December 18, 2012

Last call for submiting papers to ESS NY

This Friday, December 21, is the last day for submitting papers and workshops to ESS in NY in May 21-22. See the information site at the Enterprise Search Summit Call for Speakers page.

If you work with enterprise search technologies (or supporting technologies), chances are the things you've learned would be valuable to other folks. If you have an in-depth topic, write it up as a 3 hour workshop; if you have a success story, or lessons learned you can share, submit a talk for a 30-45 minute session.

I have to say, this conference has enjoyed a multi-year run in terms of quality of talks and excellent Spring weather.. see you in May?