19 posts categorized "LucidWorks"

September 18, 2014

Lucidworks ships Fusion 1.0 - Pretty exciting next gen platform.

OK, I've known about this coming for a while, just didn't know when until this afternoon - so I stayed up late to get the download started after midnight.

Fusion is more than an updated release of Lucidworks Search. It is Solr based, but it's a re-write from top to bottom. And it's not a bare bones search API only a developer can love. Connectors? Check. Security? Check. Analytics? Check. Entity extraction? Check. All included. 

But what it adds is where the real capabilities and contributions are. Machine learning? Check. Admin console? Check. Machine learning? Check. Log analytics? Check. A document pre-processing pipeline? Check. Deep signal processing (think 'automated context processing')? Check. 

Even if you think these new unique capabilities are not your style, then you can buy Solr support and still get licenses for connectors, entity extraction, and a handful of other formerly 'premium' products. Want it all? License the full product at a per-node price I always thought was underpriced. I'm sure you'll be hearing alot more in the coming days and weeks, but go - download - try - and see what it does for your sites. Your developers will love it, your business owners will love it, your users will love it, and I bet even your CFO will love it.  

Full disclosure: I am a former employee of Lucidworks; but I'd be just as excited even if I were not. Go download it for sure and try it on your content. But be sure to check out the  'search as killer app' video on Lucid's home page www.lucidworks.com

s/ Miles

 

 

September 09, 2014

Sometimes you're just wrong! (Maybe).

OK, this one falls into the 'eat your own words' category, so I have to come clean. Well, partly clean. Let me explain.

I was out of town last week, but just before I left I wrote an article asserting that Elasticsearch really isn't 'enterprise' search. The article drew alot of attention and comments from both sides of the argument. I have to say I still think that's the case, but an announcement by Microsoft seems to differ, and end up a net positive for Elasticsearch. Microsoft tells us that Elasticsearch is the platform under the covers of Microsoft's Azure search offering. It looks like you have a couple of options - as long as you're on Azure:

a) You can download and use the open source Elasticsearch platform available on GitHub; or

b) Use Microsoft's managed service 'Facetflow Elasticsearch' which incorporates (some of) the open source code in various places.

Microsoft calls this "a fully-managed real-time search and analytics service" while, according to ZDNet, it is for 'web and mobile application developers looking to incorporate full-text search into their applications'. 

Either way, it's certainly yet another step forward for Elasticsearch, and is a big step forward in visibility for the company. It's not clear what kind of revenue they will receive from the deal - Microsoft being relatively famous for being quite frugal. And after all, smart search folks like Kevin Green of Spantree Technology Group talk about its strengths and liabilities, saying it *is* fast ('wicked fast'); fault-tolerant; distributed and more. But it is not a crawler; a machine learner; a user-facing front end, and it is not secure. 

So I'll agree a partial 'mea culpa' is in order; adding capabilities to an open source project can make it more enterprise ready. But I think the jury may still be out on the rest of my piece. Stay tuned!

August 25, 2014

Is Elasticsearch really enterprise search?

Not too long ago, Gartner released it's the 2014 Magic Quadrant which I’ve written about here and which has generated a lively discussion on the Enterprise Search Engine Professionals group over on LinkedIn.

Much of the discussions I’ve seen about this year's MQ deals with the omission of several platforms that most people think of as 'enterprise search’. Consider that MQ alumni Endeca, Exalead, Vivisimo, Microsoft FAST, and others don’t even appear this year. Over the last few years larger companies acquired most of these players, but in the MQ it's as if they simply ceased to exist.

The name I've heard mentioned besides these previous MQ alumni is Elasticsearch, a relatively new start-up. Elasticsearch, based on Apache Lucene, recently had a huge round of investment by some A-List VCs. What's the deal, Gartner?

Before I share my opinion, I have to reiterate that, until recently, I was an employee of Lucidworks, which many people see as a competitor to Elasticsearch. I believe my opinions are valid here, and I believe I’m known for being vendor-neutral. I think the best search platform for a given environment is a function of the platform and the environment – what data source, security, management and budget apply for any given company or department. “Search engine mismatch’ is a real problem and we’ve written about it for years.

Given that caveat, I believe I’m accurately describing the situation, and I encourage you to leave a comment if you think I've lost my objectivity!

OK, here goes. I don't believe Elasticsearch is in the enterprise search space. For that reason, if for no other, it doesn’t belong on the Gartner Magic Quadrant for search.

You heard it here. It's not that I don't think Elasticsearch isn’t a powerful, cool, and valuable tool. It is all that, and more. As I mentioned, it’s based on Apache Lucene, a fantastic embedded search tool. In fact, it's the same tool Solr (and therefore Lucidworks' commercial products) are based on.  But Lucene by itself is a tool more than a solution for enterprise search.

Let me start by addressing what I think Elasticsearch is great for: search-enabled data visualization. The first time I attended an Elasticsearch meet-up, they were showing the product in conjunction with two other open source projects: Logstash and Kibana. The total effect was great and made for a fantastic demo! I was fully and completely impressed, and saw the value immediately - search driving a visualization tool that was engaging, interactive, and exciting! 

Since then, Elasticsearch has apparently hired the guys who created those two respective open source projects, and has now morphed into a log analytics company - more like Splunk with great presentation capability, and less like traditional enterprise search. Their product is ELK - Elasticsearch Logstash Kibana. You can download all of these from GitHub, by the way.

(Lucidworks has also seen the value of Kibana to enterprise search, and has released their own version of Logstash and Kibana integrated with Solr called SiLK (Solr-Integrated Logstash and Kibana).

Now let me tell you why I do not think of Elasticsearch as an enterprise search solution. First, in my time at Lucid, I'm not aware of any enterprise opportunities that Lucidworks lost to ELK. I could be wrong, and maybe the Elastic guys know of many deals we never saw at Lucid. But with no crawler and other components I consider ‘required’ as part of an enterprise search product, I'm not sure they're interested - yet, at least.

Next, check the title of their home page: "Open Source Distributed Real Time Search". Doesn't scream 'Google Search Appliance replacement', does it? Read Elasticsearch founder Shay Banon on the GSA.

Finally, Wired Magazine has an even more interesting quote: Shay Banon on SharePoint. “We're not doing enterprise search in the traditional sense. We're not going to index SharePoint documents”.

Now, with the growth and the money Elasticsearch has, they may change their tune. But with over $100M in venture capital now, I think their investors are valuing Elasticsearch as a Splunk competitor, and perhaps a NoSQL search product for Hadoop - not a traditional enterprise search engine. 

So the real question is: which space are you in? Enterprise Search with SharePoint and other legacy data sources? Web content and file shares you need a crawler for? Is LDAP or Active Directory security important to you? Well - I won't say 'no way' - but I'd want to see it before I buy.

Do you use Elasticsearch for your enterprise search? Let me hear from you!

 

 

 

August 21, 2014

More on the Gartner MQ: Fact or fiction?

There is a lively discussion going on over in the LinkedIn ‘Enterprise Search Engine Professionals’ group about the recent Gartner Magic Quadrant report on Enterprise Search. Whit Andrews, a Gartner Research VP, has replied that the Gartner MQ is not a 'pay to play'. I confess guilt to have been the one who brought the topic up in these threads, at least, and I certainly thank Whit for clarifying the misunderstanding directly.

That said, two of my colleagues who are true search experts have raised some questions I thought should be addressed.

Charlie Hull of UK-based Flax says he's “unconvinced of the value of the MQ to anyone wanting a comprehensive … view of the options available in the search market'. And Otis Gospodnetić of New York-based Sematext asks "why (would) anyone bother with Gartner's reports. We all know they don't necessarily match the reality". I want to try to address those two very good points.

First, I'm not sure Gartner claims to be a comprehensive overview of the search market. Perhaps there are more thorough lists- my friends and colleagues Avi Rappoport and Steve Arnold both have more complete coverage. Avi, now at Search Technologies, still maintains   

www.searchtools.com with a list that is as much a history of search as a list of vendors. And Steve Arnold has a great deal of free content on his site as well as high quality technology overviews by subscription. Find links to both at arnoldit.com.

Nonetheless, Gartner does have published criterion, and being a paid subscriber is not one of them. His fellow Gartner analyst French Caldwell calls that out on his blog. By the way, I have first-hand experience that Gartner is willing to cut some slack to companies that don't quite meet all of their guidelines for inclusion, and I think that adds credence to the claim that everything.

A more interesting question is one that Otis raises: “why would anyone bother with Gartner's reports”?

To answer that, let me paraphrase a well-known quote from the early days of computers: "No one ever got fired for following Gartner's advice". They are well known for having good if not perfect advice - and I'd suspect that in the fine print, Gartner even acknowledges the fallibility of their recommendations. And all of us know that in real life, you can't select software as complex as an enterprise search platform without a proof of concept in your environment and on your content.

The industry is full of examples where the *best* technology loses pretty consistently to 'pretty good' stuff backed by a major firm/analyst/expert. Otis, I know you're an expert, and I'd take what you say as gospel. A VP at a big corporation who is not familiar with search (or his company's detailed search requirements) may not do so. And any one on that VP's staff who picks a platform based solely on what someone like you or I say probably faces some amount of career risk. That said, I think I speak for Otis and Charlie and others when I say I am glad that a number of folks have listened to our advice and are still fully employed!]

So - in summary, I think we're all right. Whit Andrews and Gartner provide advice that large organizations trust because of the overall methodology of their evaluation. Everyone does know it's not infallible, so a smart company will use the 'trust but verify' approach. And they continue to trust you and I, but more so when Gartner or Forrester or one of the large national consulting companies conforms our recommendation. And of not, we have to provide a compelling reason why something else is better for them. And the longer we're successful with out clients, the more credible we become.

 

 

August 05, 2014

The unspoken "search user contract"

Search usability is a major difference between search that works and search that sucks. If you want a free one-hour usability consultation, let me know.

I recently had lunch with my long time friend and associate Avi Rappoport from Search Technologies. We had a great time exchanging stories about some of the search problems our clients have. She mentioned one customer who she was explaining what best practices to follow when laying out a result list. That brought to mind what I've called the search user contract, which users tacitly expect when they use your search on all of your sites, internally and externally.

If you are responsible for an instance of search running inside a firewall, even if it's outward facing, you have a problem your predecessors of 15 to 20 years ago* didn't have. Back then, most users didn't have experience with search except the one you provided - so they didn't have expectations of what it could be like.

Fast forward to 2014. In addition to your intranet search, virtually everyone in your organization knows, uses, and often loves Amazon, Facebook, Google, Apple, eBay and others. They know what really great search looks like. They expect you to suggest searches (or even products) on the fly! Search today knows misspelled words and what other products you might like.

But most importantly, almost all of these sites follow the same unspoken user contract:

  • On the result list, the search box goes at the top, either across a wide swath of the browser window, or in a smaller box on the left hand side, near the top.
  • There is more than one search box on the results page.
  • Search results, numbered or not, show a page title and a meaningful summary of the document. Sometimes the summary is just a snippet. Words that cause the document to be returned are sometimes bolded in the summary.
  • Suggestions for the words and phrases you type show up just below the search box (or up in the URL field)
  • Facets, when available, go along the left hand side and/or across the top, just under the search box. Occasionally they can be on the right of the result list.
  • When facets are displayed own the left or right side of the screen, the numbers next to each facet indicate how many results show when you click that facet.
  • Best bets, boosted results, or promoted results show up at the top of the result list.
  • Advertisements or special announcements appear on the right side of the result list.
  • Links to the 'next’ or ‘previous' results page appear at the bottom and possibly at the top of the results.

Now it's time to look your web sites - public facing as well as behind your firewall. Things we often see include:

  • Spelling suggestions in small, dark font very close to the site background color, at the left edge of the content, just above facets. Users don't expect to look there for suggestions, and even if they do look, make the color stand out so users see it** [Don't make the user think]
  • An extra search form on the page; one at the top as 'part of our standard header block'; and one right above the result list to enable drill down. The results you see will different depending on which field to type in. [The visitor is confused: which search button should be pressed to do a 'drill down' search. Again, don't make the user think]
  • Tabs for drilling into different content areas seem to be facets; but some of the tabs ('News") have no results. [Facets should only display if, by clicking on a facet, the user can see more content]
  • As I said at the top, we’ve found poor search user experience is a major reason employees and site visitors report that ‘search sucks’. One of the standard engagements we do is a Search Audit, which includes search usability in addition to a review of user requirement and expectations.  If you want a free consult on your usability, let me know.

 

/s/Miles

 

*Yes, Virginia, there was enterprise search 20 or more years ago. Virtually none of those names still exist, but their technology is still touching you every day. Fulcrum, Verity, Excalibur and others were solving problems for corporations and government agencies; and of course Yahoo was founded in 1994.

**True story, with names omitted to protect the innocent. On a site where I was asked to deliver a search quality audit, ‘spelling suggestions’ was a top requested feature. They actually had spell suggestions, in grey letters in a dark black field with a dark green background, far to the left of the browser window. No one noticed them. You know you are; you’re welcome!

 

July 21, 2014

Gartner MQ 2014 for Search: Surprise!

Funny, just last week I tweeted about how late the Gartner Magic Quadrant for Enterprise Search is this year. Usually it's out in March, and here it is, July.

Well, it's out - and boy does it have some surprises! My first take:

Coveo, a great search platform that runs on Windows only, is in the Leaders quadrant, and best overall in the "Completeness of Vision". Don't get me wrong, it's a great search platform; but I guess completeness of vision does not include completeness of platform. Linux your flavor? Sorry.

HP/Autonomy IDOL is in the upper right quadrant as well, back strong as the top in 'Ability to Execute' and in the top three on 'Completeness of Vision'. IDOL has always reminded me of the reliable old Douglas DC-3, described by aviation enthusiasts as 'a collection of parts flying in loose formation', but it really does offer everything enterprise search needs. And, because it loves big hardware, everything that HP loves to sell.

BA Insight surprised me with their Knowledge Integration Platform at the top of the Visionaries quadrant. It enhances Microsoft SharePoint Search, or runs with a stand-alone version of Lucene. It's very cool, yes. But I sure don't think of it as a search engine. Do you? More on this later.

Attivio comes in solid in the lower right 'Visionaries' quadrant. I'd really expected to see them further along on both measures, so I'm surprised.

I'm really quite disappointed that Gartner places my former employer Lucidworks solidly in the lower left 'Niche players' quadrant. I think Lucidworks has a very good vision of where they want to go, and I think most enterprises will find it compelling once they take a look. I don’t think I'm biased when I say that this may be Gartner's big miss this year. And OK, I understand that, like BA Insight's Knowledge product, Lucidworks needs a search engine to run, but it feels more like a true search platform.

Big surprise: IHS, which I have always thought as a publisher, has made it to the Gartner Niche quadrant as a search platform. Odd.

Other surprises: IBM in the Niche market quadrant, based on 'Ability to Execute'. Back at Verity, then CEO Philippe Courtot got the Gartner folks to admit that the big component of Ability to Execute was really about how long you could fund the project and I have to confess I figured IBM (and Google) as the MQ companies with the best cash position.

If you're not a Gartner client, I'm sorry you won't get the report or the insights Whit Andrews (@WhitAndrews _), a long time search analyst who knows his stuff. You can still find the report from several vendors happy to let you download the Gartner MQ Search from them. Search Google and find the link you most prefer, or call your vendor for a full copy.

/s/Miles

What does it take to qualify as 'Big Data'?

If you've been on a deserted island for a couple of decades, you may not have heard the hot new buzz phrase: Big Data. And you many not have heard of "Hadoop", the application that accidentally solved the problem of Big Data.

Hadoop was originally designed as a way for the open source Nutch crawler to store its content prior to indexing. Nutch was fine for crawling sites; but of you wanted to crawl really massive data sets – say the Internet – you needed a better way to store the content (thank goodness Doug Cutting didn’t work at a database giant or we’d all be speaking SQL now!) GigaOm has a great series on the history of Hadoop http://bit.ly/1jOMHiQ I recommend for anyone interested in how it all began and evolved,

After a number of false starts, brick walls, and subsequent successes, Hadoop is a technology that really enables what we now call ‘big data’- usually written as "Big Data". But what does this mean?  After all, there are companies with a lot of data – and there are companies with limited content size that changes rapidly every day. But which of these really have data that meets the 'Big" definition.  

Consider a company like AT&T or Xerox PARC, which licenses its technology to companies worldwide. As part of a license agreement, PARC agrees to defend its licensees if an intellectual property lawsuit ever crosses the transom. Both companies own over tens of thousands patents going back to its founding in the early 20th century. Just the digital content to support these patents and inventions must number on the tens of millions of documents, much of which is in formats no longer supported by any modern search platform. Heck, to Xerox, WordStar and Peachtext probably seem pretty recent! But about the only time they have to access their content search is when a licensee needs help defending a licensee against an IP claim. I don’t know how often that is, but I’d bet less than a dozen times a year.

Now consider a retail giant like Amazon or Best Buy. In raw size, I’d bet Amazon has hundreds of millions of items to index: books, products, videos, tunes. Maybe more. But that’s not what makes Amazon successful. I think it’s the ability to execute billions of queries every day – again, maybe more – and return damn good results in well under a second, along with recommendations for related products. Best buy actually has retail stores, so they have to keep purchase data, but also buying patterns so they know what products to stock in any given retail location.

A healthcare company like UnitedHealth must have its share of corporate intranet content. But unlike many corporations, these companies must process millions of medical transactions every week: doctor visits, prescriptions, test results, and more. They need to process these transactions, but they also must keep these transactions around for legally defined durations.

Finally, consider a global telecom company like Ericsson or Verizon. They’ve got the usual corporate intranet, I’m sure. They have financial transactions like Amazon and UHG. But they also have telecomm transaction records that must count in the billions a month: phone calls and more. And given the politics of the world, many of these transactions have to be maintained and searchable for months, if not years.

These four companies have a number of common traits with respect to search; but each has its own specific demands. Which ones count as ‘big data’ as it’s usually defined? And which just have ‘a bunch of content?

As it turns out that’s a touch question. At one point, there was a consensus that ‘big data’ required three things, known as the “Three V’s of Big Data’. This escalated to the ‘5 V’s of Big Data’, then the “7 V’s”– and I’ve even seen some define the “10 V’s of Big Data”. Wow.. and growing!

Let’s take a look at the various “V’s” that are commonly used to define ‘Big Data’.

Depending on who you ask, there are four, five, seven or more ‘requirements’ that define ‘big data. These are usually referred to as the “Vs of Big Data”, and these usually include:

Volume: The scale of your data – basically, how many ‘entries’ or ‘items’, you have. For Xerox, how many patents; for a telecom company, how many phone ‘transactions’ have there been.   

Variety: Basically this means how many different types of data you have. Amazon has mouse clicks, product views, unique titles, subscribers, financial transactions and more. For UHG and Ericsson, I’d guess the majority of their content is transactional: phone call metadata (originating and receiving phone number, duration of the call, time of day, etc.). In the enterprise, variety can also mean data format and structure. Some claim that 90% of enterprise data is unstructured, which adds yet another challenge.

Veracity: The boils down whether the data is trustworthy and meaningful. I remember a survey HP did years ago to find out what predictors were useful to know whether a person waking into a random electronics store would walk out with an HP PC. Using HP products at work or at home we the big predictors; but the fact that the most likely day was Tuesday was perhaps spurious and not very valuable.

Velocity: How fast is the data coming in and/or changing. Amazon has a pretty good idea on any given day how many transactions they can expect, and Verizon knows how much call data they can expect. But things change: A new product becomes available, or a major world event triggers many more phone calls than usual.

Viability: If you want to track trends, you need to know what data points are the most useful in predicting the future. A good friend of mine bought a router on Amazon; and Amazon reported that people who bought that router also bought.. men’s extra large jeans. Now, he tells me he did think they were nice jeans, but that signal may not have had long viability.

Value: How useful or important is the data in making a prediction, or in improving business decisions. That was easy!

Variability: This often refers to how internally consistent the data is. To a data point as an accurate predictor, that data point is ideally consistent across the wide range of content. Blood pressure, for example, is generally in a small range; and for a given patient, should be relatively consistent over time. When there is a change, UHG may want to understand the cause.

Visualization: Rows and columns of data can look pretty intimidating and it’s not easy to extract meaning from them. But as they say, ‘a picture is worth a thousand words’, so being able to see charts or graphs can help meaning and trends jump out at you.  I’d use Lucidworks’ SiLK product as an example of a great visualization tool for big data, but there are many others.

Validity: This seems like another way to say the data has veracity, but it may be a subtle point. If you’re recording click-thru data, or prescriptions, or intellectual property, you have to know that the data is accurate and internally consistent. In my HP anecdote above, is the fact that more people bought HP PCs on Tuesday a valid finding? Or is it simply noise? You’ll probably need a human researcher to make these kinds of calls.

Venue: With respect to Big Data, this means where the data came from and where it will be used. Content collected from automobiles and from airplanes may look similar in a lot of ways to the novice. In the same way, data from the public Internet versus data collected from a private cloud may look almost identical. But making decisions for your intranet based on data collected from Bing or Google may prove to be a risk.

Vocabulary: What describes or defines the various items of the data. Ericsson has to know which bit of data represent a phone number and which represent the time of day. Without some idea of the schema or taxonomy, we’ll be hard pressed to reach reasonable decisions from Big Data.

Volatility: This may seem like velocity above, but volatility in Big Data really means how long is the data value, how long do you need to keep it around.  Healthcare companies may need to keep the data a lot longer than

Vagueness: This final one is credited to Venkat Krishnamurthy of YarcData just last month at the Big Data Innovation Summit here in Silicon Valley.  In a way, it addresses the confidence we can have in the results suggested by the data. Are we seeing real trends, or are we witnessing a black swan?

In the application of Big Data not all of these various V’s are as valid or valuable to the casual (or serious) observer. But as in so many things, interpreting the data is to the person making the call. Big Data is only a tool: use it wisely!

Some resources I used in collection data for this article include the follow web sites and blogs:

IBM’s Big Data & Analytics Hub 

MapR's Blog: Top 10 Big Data Challenges – A Serious Look at 10 Big Data V’s 

See also Dr. Kirk Borne’s Top 10 List on Data Science Central   

Bernard Marr’s LinkedIn post on The 5 Vs Everyone Must Know 

 

May 14, 2013

Open Source Search Myth 5 - Total Cost of Ownership

This is part of a series addressing the misconception that open sounce search is too risky for companies to use. You can find the introduction to the series here; and Part 4, Features and Capabilities, here.

Part 5: Total Cost of Ownership

Total cost of ownership, TCO, is a big deal to large users of search technology. Usually, the component of TCO with respect to search is the license fee; enterprise search was historically an expensive proposition. But in fact there are other major components of TCO including implementation/operations, hardware cost, and ongoing support come to mind.

Walter Underwood, one of the key developers at Ultraseek and later the guy who did the Netflix relevancy contest, once explained the difference between commercial and open source search. Let me paraphrase: 

"With commercial search, you spend a lot of money to license it; then you spend a lot of money to implement it.

With open source search, you download the software for free; then you spend alot of money implementing it."

But there is another big element: how much iron do you need? A few years ago we helped a company switch search platform. Their business was search enabling small-town newspaper archives going back to the 1890s, via OCR'd content. They add tens of thousands of documents - historical newspaper articles - every day. 

The commercial platform they replaced required major expense in new servers as they content grew. Every year.

As it turns out, the ROI for swapping out their old search engine was easy: they needed less new hardware every year than with the old engine. And so much less that the ROI period was less than a year.

A different project we did when we were still doing business as New Idea Engineering involved a comparison between Microsoft SharePoint 2010 and search with Solr. Our customer wanted to know if the switch would, indeed, require fewer servers to do the job. It turns out that it was quite reasonable to replace the 12 servers Microsoft FAST required with 6 or fewer servers running Solr. Half the cost of servers; half the cost of energy; half the cost of maintenance. Like the concept?

Now, I'll agree that LucidWorks - my employer - markets a proprietary search platform based on Solr. And we do not license the product for free. But compared to most commercial platforms, LucidWorks Search is pretty darned reasonable. And you still get the cost savings in energy, iron, and scalability.

Less hardware. Better search. How is the TCO of open source a liability compared to most commercial search platforms?

 

 

April 23, 2013

Open Source Search Myth 4 - Features and Capabilities Lag

This is part of a series addressing the misconception that open source search is too risky for companies to use. You can find the introduction to the series here; and Part 3, Skills required In-House, here.

Part 4: Features and Capabilities Lag

Keeping up with the latest and greatest technology is important, especially when there is a great deal of innovation in a field. Enterprise search is one such field.

In this post I'll address the claim that "Production functionality may trail in specific features relative to commercial search firms".

First, let me remind you that many of the coolest advanced capabilities in modern search platforms is delivered using third party products integrated into the actual search product. Examples:

Entity extraction: Cool stuff, and part of many search platforms. Often implemented using technology from companies like Basis Technology, Pingar, and others.

Non-English support: Required for any large-scale enterprise. Think Basis Technology again; or pretty darned good open source filters.

Document format support: Leaders here were smaller companies that were eventually purchased by larger search companies: Keyview (not Autonomy); Stellent (now Oracle); ISYS (now IBM). Open source Tika.

Sentiment Analysis: Identify 'positive' versus 'negative' sentiment, using products from Lexalytics, Attensity, SAS, LingPipe and others. 

My point is not that large enterprise search platform companies do not include some cool new technologies in their products: it's just that the 'cool' usually comes from a third party that can be licensed for use in any platform, not just "commercial" ones. 

And, when you use open source platforms, you always have the option of doing a feature yourself - either in-house, or using a consulting firm.

And you might not be aware of capabilities where open source Solr is ahead of many commercial vendors. For example, consider Geo search, which lets you easily search for 'documents' relevant to a particular location.  And it can even be used to answer questions like "what managers are on-duty on Saturday night at the LA store".

I will say that Microsoft, in its SharePoint 2013, has implemented a very nice query boosting tool that, as far as I can tell, was created in-house - I doubt it was in the FAST pipeline at the acquisition. 

But give that caveat, I'd ask, what with all of recent acquisition and mergers, whether any 'enterprise search' company implemented major new capability like pivot facets, entity extraction and more - without licensing the technology from an outside company?

 

 

 

 

 

March 20, 2013

Open Source Search Myth 3: Skills Required In-House

This is part of a series addressing the misconception that open source search is too risky for companies to use. You can find the introduction to the series here; this is Part 3 of the series; for Part 2 click Potentially Expensive Customizations.

Part 3: Skills Required In-House

One of the hallmarks of enterprise software in general is that it is complex. People in large organizations who manage instances of enterprise search as no less likely than their non-technical peers to believe that "if Google can make search so good on the internet, enterprise search must be trivial". Sadly, that is the killer myth of search.

Google on the internet - or Bing or Baidu or whichever site you use and love - is good because of the supporting technology, NOT simply because of search. I'd wager that most of what people like about Google et al has very little to do with search and a great deal to do constant monitoring and tweaking of the platform.

Consider: at the Google 'command line' (the search box), you can type in an arithmetic equation such as "2+3" get 5. You can enter a FedEx tracking number and get a suggestion to link to FedEx for information. It's cool that Google provides those capabilities and others; but those features are there because Google has programs looking at search behavior for all of its users every day in order to understand user intent. When something unusual comes up, humans get involved and make judgments. When it makes sense, Google implements another capability - in front of the search engine, not within it.

Enterprise search is the same - except that very few companies invest money in managing and running their search; so no matter how well you tune it at the beginning, quality deteriorates over time. Enterprise search is not 'fire and forget'.

 Any company that rolls out a mission critical application and does NOT have their own skilled team in house is going to pay a consulting form thousands of dollars a day forever. 'Nuff said.