45 posts categorized "Google Search Appliance"

August 09, 2011

So how many machines does *your* vendor suggest for 100,000,000+ document dataset?

We've been chatting with folks lately about really large data sets.  Clients who have a problem, and vendors who claim they can help.

But a basic question keeps coming up - not licensing - but "how many machines will we need?"  And not everybody can put their data on a public cloud, and private clouds can't always spit out a dozen virtual machines to play with, plus duplicates of that for dev and staging, so not quite as trivial as some folks thing.

The Tier-1 vendors can handle hundreds of millions of dcs, sure, but usually on quite a few machines, plus of course their premium licensing, and some non trivial setup at that point.

And as much as we love Lucene, Solr, Nutch and Hadoop, our tests show you need a fair number of machines if you're going to turn around a half billion docs in less than a week.

And beyond indexing time, once you start doing 3 or 4 facet filters, you also hit another performance knee.

We've got 4 Tier-2 vendors on our "short list" that might be able to reduce machine counts by a factor of 10 or more over the Tier-1 and open source guys.  But we'd love to hear your experiences.

August 02, 2011

Connecting Google to SharePoint 2010: White Paper

Ba_insight_logo NIE partner BA Insight will soon be releasing a white paper highlighting key differences between SharePoint 2010 search and the Google Search Appliance.

The early draft we've seen of Google & Microsoft Enterprise Search Product Comparison, and can talk about some of the discussion points.

Update: The white paper is now available! Get it now.

Generally, the research paper discusses the following topics:

1. Content: Crawling, indexing, security, and connectors

2. Query processing: Manual and automatic relevance tuning; actionable results; and layout design

3. Vendor 'intangibles': Maintenance, support, vendor stability, licensing and the partner eco-system

BA-Insight is a large Microsoft partner, and their DNA reflects it. But many customers use Google Search Appliances with SharePoint, and this comparison is nicely done. Keep an eye on their site or stay tuned for updates here when the research paper is available.

And let us know what's on your mind with respect to enterprise search - leave a comment!

 

 

May 31, 2011

It's not Google unless it says it's Google

A few years back, one of our customers told us that, if he could just license the 'Powered by Google' icon, he was sure most of the users would stop complaining. Not long after that, we heard that our friend Andy Feit, who was VP of search at Verity, hired a marketing research team to compare the quality of search engines when one was "Powered by Verity" and the other was 'Powered by Google'. Andy found that people thought the Google results were significantly better - even though both test cases were, in fact, powered by Verity. The mere presence of the Google icon seemed to make people think the results were better.

At the recent ESS, a woman from Booz & Company talked about their previous enterprise search experience involving Google. A few years back, Booz used FAST ESP on SharePoint 2003 and it simply sucked. Users asked for Google by name. When they upgraded to SharePoint 2007, Booz gave the users what they wanted: they went with a Google Search Appliance. The trouble was that they built a custom interface with a generic search button. Users' responses? "Search still sucks - why don't we just use Google?" even as they were using Google!

This can teach us a number of lessons:

1. Analyze what you need search to do for you before you buy it.

2. Understand how your content and search platform play together.

3, It ain't Google unless you tell your users it's Google.

By the way: in 2010 Booz rolled out FAST Search for SharePoint, and it seems that the results are a bit better now that they understand their search requirements and the nature of their content and metadata.

 

 

May 21, 2011

Google and the official search blog

A couple of days ago, Google started Inside Search, the 'official Google search blog'. It's not really enterprise search news, but because so many knowledge workers compare the behavior of their internal search platform with the Google public search experience, it may be worth monitoring for those whose job it is to keep enterprise search going.

 

May 19, 2011

Content owners don't care about metadata

Or do they?

Our recent post about Booz & Company's 'men named Sarah' highlights just how important good metadata can be in order to provide a great search experience for employees and customers.

One of our customers who spoke at the recent ESS 2011 in New York provided some great insights into the problems organizations have getting employee content creators to include good metadata with their documents.

During the ESS talk, they report that content owners don't really seem motivated when asked to help improve the overall intranet site by improving document metadata. However - and this is a big one - when a sub-site owner sees poor results on their own site, they are willing to invest the time to provide really good metadata.

[A bit of background: This customer provides a way to individual site owners within the organization to add search to their 'sub site' pretty much automatically - sort of a 'search as a service' within the enterprise.]

So if you've been thinking of adding the ability to search-enable sub-sites within your organization, but solving the relevance problem is your first task, you might reconsider your priorities!

/s/Miles

May 16, 2011

Sixty guys named Sarah

We're always on the lookout for anecdotes to use at trade shows, with our customers and prospects, and of course here in the blog, so I have to report that we heard a great one last week at Enterprise Search Summit in New York.

The folks from Booz & Company, a spinoff from Booz Allen Hamilton, did a presentation on their experience comparing two well respected mainstream search products. They report that, at one point, one of the presenters was looking for a woman she knew named Sarah - but she was having trouble remembering Sarah's last name. The presenter told of searching one of the engines under evaluation and finding that most of the top 60 people returned from the search were... men. None were named 'Sue'; and apparently none were named Sarah either. The other engine returned records for a number of women named Sarah; and, as it turns out, for a few men as well.

After some frustration, they finally got to the root of the problem. It turns out that all of the Booz & Company employees have their resumes indexed as part of their profiles. Would you like to guess the name of the person who authored the original resume template? Yep - Sarah.

One of the search platforms ranks document metadata very high, without much ability to tune the weighting algorithms. The other provides a way to tune the relevance; but it also tends to rank people relevance a bit differently - probably stressing documents about people less than the individual people profiles. The presentation was a bit vague about whether any actual tuning that might impact these differences on either platform.

The fact that one of the engines did well, and one did not, is not the big story here - although it is something for you to consider if you're evaluating enterprise search platforms. The real lesson here is that poor metadata makes even the best of search platforms perform poorly in some - if not most - cases.

 

February 02, 2011

Make your search engine seem psychic

People tell us that Google just seems to know what they want - it's almost psychic sometimes. If only every search engine could be like Google. Well, maybe it can.

Over the years, the functions performed by the actual 'search engine' have grown. At first, it was simply a search for an exact match - probably using punch card input. Then, over time, new and expanded capabilities were added, including stemming... synonyms... expanded query languages... weighting based on fields and metadata.. and more. But no matter what the search technology provided, really demanding search consumers pushed the technology, often by wrapping extra processing both at index time and at query time. This let the most innovative search driven organizations stay ahead of the competition. Two great examples today: LexisNexis and Factiva.

In fact, the magic that makes public Google search so good - and so much better than even the Google Search Appliance - is the armies of specialists analyzing query activity and adding specialized actions 'above' the search engine. 

One example of this many of us know well: enter a 12 digit number. if the format of the number matches the algorithm used by FedEx in creating tracking numbers, Google will offer to let you track that package directly from FedEx. For example, search for 796579057470 and you see a delivery record; change that last 1 to a zero, and you get no hits. How do they know?

The folks at Google must have noticed lots of 12 digit numbers as queries; and being smart, they realized that many were FedEx tracking numbers. I imagine, working in conjunction with FedEx, Google implemented the algorithm - what makes a valid FedEx tracking number - and boosted that as a 'best bet'.

Why is this important to you? Well, first it shows that Google.com is great in part because of the army of humans who review search activity, likely on a daily basis. Oh, sure, they have automated tools to help them out - with maybe 100 million queries every day, you'd need to automate too. They look for interesting trends and search behavior that lets them provide better answers.

Secondly, you can do the same sort of thing at your organization. Autonomy, Exalead, Microsoft, Lucene, and even the Google Search Appliance, can all be improved with some custom code after the user query but before the results show up. Did the user type what looks like a name? Check the employee directory and suggest a phone number or an email address. Is the query a product name? Suggest the product page. You can make your search psychic.

Finally, does the query return no hits? You can tell what form the user was on when the search was submitted - rather than a generic 'No Hits' page. Was the query more than a single term? Look for any of the words, rather than all; make a guess at what the user wanted, based on the search form, pervious searches, or whatever context you can find.

So how do you make your search engine seem psychic? Learn about query tuning and result list pre-processing; we've written a number of articles about query tuning in our newsletter alone.

But most importantly: mimic Google: work hard at it every day.

/s/Miles

 

 

 

 

December 05, 2010

Share your successes at ESS East next May

ESSSpringLogo Our friends over at InfoToday who run the successful Enterprise Search Summit conferences have asked us  to announce that the date for submitting papers to their Spring show in New York in May 2011 has been extended until Wednesday, December 8. You can find out what they are looking for and how to submit your proposal online at http://www.enterprisesearchsummit.com/Spring2011/CallForSpeakers.aspx.

Michelle Manafy, who runs the program again next May, really likes to have speakers who have found creative and successful ways to select, deploy, or manage ongoing enterprise search operations. We've co-presented with several of our customers in the past, and trust me, it's great fun and not bad for your career! And - no promises - the weather at ESS East has been great for just about every year - and we've been there for nearly 6 years now!

A friend told me something years ago that I've always fond helpful; I hope you'll take it to heart: 'Everything you know, someone else needs to know'. Don't worry if your search project isn't perfect; or worry that someone will find fault with what you've done. Trust me: there are many organizations newer to enterprise search than you are, and anything you found helpful will sure be valuable for them as well. And you get to attend al of the sessions, so you might learn more as well! A 'win-win' situation if I've ever seen one!

See you in New York!

/s/Miles

 

 

November 08, 2010

Enterprise Search Summit DC November 15-18

The new home for the Fall ESS show is the Renaissance Hotel in downtown Washington, DC... so much for ESS-West! The new locale should bring a large number of new attendees and visitors, and a new co-located conference: SharePoint Symposium. InfoToday knows a trend when they see one!

In addition to the usual sessions provided to show sponsors, there are some interesting sessions by Tom Reamy of KAPS Group; Martin White of Intranet Focus; and eDiscovery expert Oz Benamram, CKO of White and Case LLP. Tony Byrne of Real Story Group will also be there, moderating the session I'll be participating in: Stump the  Search Consultant on Wednesday afternoon November 17th.

I really expect the show to have a large number of government folks in attendance, given how hard it's been for these good folks to travel to previous ESS conferences in New York and San Jose. InfoToday reports higher pre-registration this year than in the past; and I'll be happy to find out I'm wrong about most of the attendees being government or government-related folks.

Come by the session Wesnesday afternoon at 3PM; or leave a comment here if you want to get together.

 

 

May 24, 2010

Does Maxxcat's New Search Appliance Challenge the Google Box?

Both Jessica Bratcher and Tim Grey have interesting posts about Maxxcat releasing several enterprise search appliances that are supposedly much faster, cheaper, and extensible then the corresponding Google search appliance, with unlimited lifetime use.

They were created from the ground up and run on a special Linux platform. "On a 1 million document collection, the kernel can dispatch and resolve a multi-term query spanning the entire collection in as little as 100 usec." (of course anything under 500 msec would be fine for an end user)

Maxxcat has also released a new version of its JDBC connector (Bobcat) that supports standard SQL and allows any JDBC compliant database to interface directly to a MaxxCAT appliance. The company claims "EX-5000 Enterprise Search appliances equipped with BobCAT are able to retrieve and index information from host systems at speeds in excess of 1GB/minute."

Their chief integration engineer stated ""We are working with a number of customers who have data in SQLServer, mySQL or Oracle Databases that we are able to easily consolidate and query against, even though the source databases and data models vary dramatically. This is simply not possible with conventional database software, which relies upon proprietary interfaces and does not handle unstructured data very well, if at all.""