49 posts categorized "Google Search Appliance"

January 04, 2012

My search platform ate my homework

In a recent article on inforword.com, Peter Wayner wrote a nifty piece discussing 11 programming trends to watch. It's interesting in general, but I found one trend really rang true for me with respect to enterprise search.

He calls his 9th trend Accuracy fades as scalability trumps all. He points out that most applications are fine with close approximations, based mainly on the assumption that at internet scale, if we miss an instance of something today, we'll probably see it again tomorrow. That brought to mind something I'm working on right now for a customer who needs 100% confidence in their search platform to meet some very stringent requirements. The InfoWorld article reminded me of a dirty little secret of nearly all enterprise search platforms, a secret you may not know (yet); but which could be important to you.

Search platform developers make assumptions about your data, and most search platforms do not index all of your content... by design! Don't get me wrong: these assumptions let them produce pretty good accuracy every time; and even 100% accuracy sometimes. And pretty good is fine most of the time. In fact, as a friend told me years ago, sometimes 'marginally acceptable' is just fine.

The theory seems to be that a search index might miss a particular term in a few documents, but any really important use of the term will clearly be indexed somewhere else and our users will get results from these other documents. In fact, some search platforms have picked an arbitrary size limit, and won't index any content past that limit even if it misses major sections of large documents. Google, in fact, is one of the few who actually document this - once the GSA has indexed 2 MB of text or 2.5MB of HTML in a file, it stops indexing that file and 'discards' the rest. This curious behavior works most of the time for most data (although there is an odd twist that will bite you if you feed GSA a large list of URLs or ODBC records). To be honest, most search platforms do this sort of trimming as well; they just don't mention it too often during the sales process.

Now, in legal markets like eDiscovery, it's pretty darned critical to get every document that contains a particular term. It's not OK to go to court and report that you missed one or more critical document because your search engine truncates or ignores some terms or some documents. That excuse might have worked in elementary school or even in high school, but it just doesn't cut it in demanding enterprise search environments.

It may not be a problem for you; just be sure that, if it is a requirement for you, you include it in your RFI/RFQ documents.

 

 

November 08, 2011

Are you spending too much on enterprise search?

If your organization uses enterprise search, or if you are in the market for a new search platform, you may want to attend our webinar next week "Are you spending too much for search?". The one hour session will address:

  • What do users expect?
  • Why not just use Google?
  • How much search do you need?
  • Is an RFI a waste of time?   

Date: Wednesday, November 16 2011

Time: 11AM Pacific Standard Time / 1900 UTC

Register today!

October 25, 2011

What search platform is best? Workshop at KMWorld

Next week in Washington DC, InfoToday runs their Fall enterprise search conferences - KM World, Enterprise Search Summit, SharePoint Symposium, and Taxonomy Boot Camp.. whew! Monday - Halloween Day! - I am giving a workshop at the conferences with the somewhat vague title 'Enterprise Search Technologies'.

What I'll be talking about is an overview of the platform vendors, with some detail on strengths and weaknesses of the vendors; and a drill down into what you need to do before you call the vendors (if you value your time).

You can still sign up for the workshop for $295US or the entire conference for a bit more; see you in DC in a week!

/s/Miles

August 22, 2011

Searching for Sarah at SharePoint Conference 2011

Just noticed one of the most interesting sessions at last May's Enterprise Search Summit is coming to the October Microsoft SharePoint Conference! We blogged about it back in May.

Basically, Booz & Company did an evaluation of SharePoint 2010 search - FAST Search for SharePoint as I recall - versus the Google Search Appliance they had been using. At one point, the search business owner was trying to find the last name of a woman she had met in the firm; and when she searched for 'Sarah', hoping to find her in the directory, the GSA returned 60 men in the result list. Can you guess why? A hint: metadata (check the earlier article, or come to SPC 2011 to find out).

Now in fact, we think the GSA could have been tuned to emulate this OOB behavior by SharePoint; but this is a reminder that not every search platform works great in every environment. Buyer beware!

Ever had a similar experience? Let us know about it!

 

August 09, 2011

So how many machines does *your* vendor suggest for 100,000,000+ document dataset?

We've been chatting with folks lately about really large data sets.  Clients who have a problem, and vendors who claim they can help.

But a basic question keeps coming up - not licensing - but "how many machines will we need?"  And not everybody can put their data on a public cloud, and private clouds can't always spit out a dozen virtual machines to play with, plus duplicates of that for dev and staging, so not quite as trivial as some folks thing.

The Tier-1 vendors can handle hundreds of millions of dcs, sure, but usually on quite a few machines, plus of course their premium licensing, and some non trivial setup at that point.

And as much as we love Lucene, Solr, Nutch and Hadoop, our tests show you need a fair number of machines if you're going to turn around a half billion docs in less than a week.

And beyond indexing time, once you start doing 3 or 4 facet filters, you also hit another performance knee.

We've got 4 Tier-2 vendors on our "short list" that might be able to reduce machine counts by a factor of 10 or more over the Tier-1 and open source guys.  But we'd love to hear your experiences.

August 02, 2011

Connecting Google to SharePoint 2010: White Paper

Ba_insight_logo NIE partner BA Insight will soon be releasing a white paper highlighting key differences between SharePoint 2010 search and the Google Search Appliance.

The early draft we've seen of Google & Microsoft Enterprise Search Product Comparison, and can talk about some of the discussion points.

Update: The white paper is now available! Get it now.

Generally, the research paper discusses the following topics:

1. Content: Crawling, indexing, security, and connectors

2. Query processing: Manual and automatic relevance tuning; actionable results; and layout design

3. Vendor 'intangibles': Maintenance, support, vendor stability, licensing and the partner eco-system

BA-Insight is a large Microsoft partner, and their DNA reflects it. But many customers use Google Search Appliances with SharePoint, and this comparison is nicely done. Keep an eye on their site or stay tuned for updates here when the research paper is available.

And let us know what's on your mind with respect to enterprise search - leave a comment!

 

 

May 31, 2011

It's not Google unless it says it's Google

A few years back, one of our customers told us that, if he could just license the 'Powered by Google' icon, he was sure most of the users would stop complaining. Not long after that, we heard that our friend Andy Feit, who was VP of search at Verity, hired a marketing research team to compare the quality of search engines when one was "Powered by Verity" and the other was 'Powered by Google'. Andy found that people thought the Google results were significantly better - even though both test cases were, in fact, powered by Verity. The mere presence of the Google icon seemed to make people think the results were better.

At the recent ESS, a woman from Booz & Company talked about their previous enterprise search experience involving Google. A few years back, Booz used FAST ESP on SharePoint 2003 and it simply sucked. Users asked for Google by name. When they upgraded to SharePoint 2007, Booz gave the users what they wanted: they went with a Google Search Appliance. The trouble was that they built a custom interface with a generic search button. Users' responses? "Search still sucks - why don't we just use Google?" even as they were using Google!

This can teach us a number of lessons:

1. Analyze what you need search to do for you before you buy it.

2. Understand how your content and search platform play together.

3, It ain't Google unless you tell your users it's Google.

By the way: in 2010 Booz rolled out FAST Search for SharePoint, and it seems that the results are a bit better now that they understand their search requirements and the nature of their content and metadata.

 

 

May 21, 2011

Google and the official search blog

A couple of days ago, Google started Inside Search, the 'official Google search blog'. It's not really enterprise search news, but because so many knowledge workers compare the behavior of their internal search platform with the Google public search experience, it may be worth monitoring for those whose job it is to keep enterprise search going.

 

May 19, 2011

Content owners don't care about metadata

Or do they?

Our recent post about Booz & Company's 'men named Sarah' highlights just how important good metadata can be in order to provide a great search experience for employees and customers.

One of our customers who spoke at the recent ESS 2011 in New York provided some great insights into the problems organizations have getting employee content creators to include good metadata with their documents.

During the ESS talk, they report that content owners don't really seem motivated when asked to help improve the overall intranet site by improving document metadata. However - and this is a big one - when a sub-site owner sees poor results on their own site, they are willing to invest the time to provide really good metadata.

[A bit of background: This customer provides a way to individual site owners within the organization to add search to their 'sub site' pretty much automatically - sort of a 'search as a service' within the enterprise.]

So if you've been thinking of adding the ability to search-enable sub-sites within your organization, but solving the relevance problem is your first task, you might reconsider your priorities!

/s/Miles

May 16, 2011

Sixty guys named Sarah

We're always on the lookout for anecdotes to use at trade shows, with our customers and prospects, and of course here in the blog, so I have to report that we heard a great one last week at Enterprise Search Summit in New York.

The folks from Booz & Company, a spinoff from Booz Allen Hamilton, did a presentation on their experience comparing two well respected mainstream search products. They report that, at one point, one of the presenters was looking for a woman she knew named Sarah - but she was having trouble remembering Sarah's last name. The presenter told of searching one of the engines under evaluation and finding that most of the top 60 people returned from the search were... men. None were named 'Sue'; and apparently none were named Sarah either. The other engine returned records for a number of women named Sarah; and, as it turns out, for a few men as well.

After some frustration, they finally got to the root of the problem. It turns out that all of the Booz & Company employees have their resumes indexed as part of their profiles. Would you like to guess the name of the person who authored the original resume template? Yep - Sarah.

One of the search platforms ranks document metadata very high, without much ability to tune the weighting algorithms. The other provides a way to tune the relevance; but it also tends to rank people relevance a bit differently - probably stressing documents about people less than the individual people profiles. The presentation was a bit vague about whether any actual tuning that might impact these differences on either platform.

The fact that one of the engines did well, and one did not, is not the big story here - although it is something for you to consider if you're evaluating enterprise search platforms. The real lesson here is that poor metadata makes even the best of search platforms perform poorly in some - if not most - cases.