« July 2011 | Main | October 2011 »

8 posts from August 2011

August 22, 2011

Searching for Sarah at SharePoint Conference 2011

Just noticed one of the most interesting sessions at last May's Enterprise Search Summit is coming to the October Microsoft SharePoint Conference! We blogged about it back in May.

Basically, Booz & Company did an evaluation of SharePoint 2010 search - FAST Search for SharePoint as I recall - versus the Google Search Appliance they had been using. At one point, the search business owner was trying to find the last name of a woman she had met in the firm; and when she searched for 'Sarah', hoping to find her in the directory, the GSA returned 60 men in the result list. Can you guess why? A hint: metadata (check the earlier article, or come to SPC 2011 to find out).

Now in fact, we think the GSA could have been tuned to emulate this OOB behavior by SharePoint; but this is a reminder that not every search platform works great in every environment. Buyer beware!

Ever had a similar experience? Let us know about it!


Autonomy marketing, meet HP

Leslie Owens, the enterprise search analyst over at Forrester Research, has written one heck of an article about a potential Autonomy in the HP era. Her analysis strikes me as being very insightful and, in my opinion, quite accurate. What makes it unique is you just don't see alot of 'analysts' tell it like it really is. Kudos to Ms. Owens!

Technical issues aside, I'm reminded of a story that goes back to the early days of the PC when HP and Apple were just beginning to compete. A popular quip about the difference between HP and Apple went "Where Apple sells sushi, HP sells cold raw dead fish". The implication, of course, being that HP just wasn't good at marketing.

AUTN I'll always think of Autonomy as a search technology company. Our first exposure to Autonomy was in 1997  with an early version of the DRE, the predecessor to IDOL. Back then, using vi or emacs to configure a search engine was pretty common; and no one really had grasped the importance of the business side of running enterprise search.

In search, IDOL returns pretty darned good results out of the box, no tuning required. But if you want to tune it, if you have alot of custom work to do, IDOL gets really tough to set up and configure... and it's still done using text editors to create and edit text configuration files. This may be one reason why IDOL projects take so long to complete and require such big teams of consultants. HP probably won't be changing this... they want to grow their consulting revenue!

But now, in the second decade of the not-so-new century, customers expect to use a GUI to configure, manage, and customize enterprise search; and just about all of IDOL is still 'command line based'. I think this is just one of the data points supporting Leslie's remark that IDOL 5 has not had a major update in over 5 years. Sure, they've added dozens of new capabilities... API calls, and the like... but the platform is still a solid 1990s kind of experience. "Powered by vi" was funny in 1998; not so much now.

Nonetheless, Autonomy has been quite effective because their technology is pretty darned good at finding content; and because their sales force has been aggressive in selling the product. HP will love the consulting; but will they be able to move product as successfully as Autonomy had?

What do you think?



August 19, 2011

7 Reasons Why the Autonomy Acquisition Makes Sense for HP

With HP's annual meeting coming up this week CEO Leo Apotheker is looking for a way to put some lipstick on his new baby: he wants to acquire Autonomy. PC sales are declining and Apple is eating their lunch in pads AND buying the old HP Cupertino campus. Apparently they will keep the server divisions.Bill and Dave must be beside themselves.

But wait! The knight in shining armor from England is here for the rescue. Autonomy announced just last month they were 'likely to beat expectations' for their fourth quarter. They have enjoyed huge success in eDiscovery.. they claim to 'own search'.. and sponsors its own football team, Tottenham Hotspur FC.

At first blush, the two seem like strange bedfellows. HP, the leader in PCs worldwide, with a well acknowledged reputation for management style - the HP Way. Autonomy has a rep like that of a great professional footballer: skilled and aggressive, known to occasionally feint an injury for his benefit after a rough hit; and perhaps booked for a card a few times every season. But still, a champ and damn good at what he does.

When you think about it, the deal isn't as odd as it may seem. Consider:

1. PC sales are down; they are being replaced by smart devices like Android and iOS devices. WebOS? Too little, too late, too expensive.

2. HP acquired EDS a couple of years back to compete with IBM in consulting services. Among enterprise class search engines, 'consulting services' certainly comes to mind - bring lots.

3.  IT managers, not C-level execs, buy PCs. Chief Risk Officers buy eDiscovery and compliance solutions so he and his fellow executives can sleep well at night. Rarely is there a spending freeze on compliance tools.

4. HP is rumored to be spinning out its PC products, but it seems the server business is staying in Palo Alto. IDOL likes lots of really big servers: HP wins on all three: servers, software, and consulting.

5. Iron Mountain: records management. See (3) above.

6. We used to speculate that SAP might be a buyer for Autonomy at some point in time. Now, the HP chief is Leo Apotheker, who came to HP from... SAP. Coincidence?

7. 'The cloud': HP needs one, HP gets one - enough said.

Still, it may not be a 'made in heaven' match.

1. You may recall that when Microsoft acquired FAST, they soon found some odd accounting issues - something about numbers overly optimistic, and booking revenues prior to firm orders were received. Until you dig deep into the details, there may be no way to know for sure until the deal is sealed.

2. And as mentioned earlier, personnel and policies seem relatively incompatible.

3. HP is along time Microsoft partner, uses SharePoint extensively, and just rolled our FAST search on its public facing web site. That should be interesting.

4. Finally, I seem to recall that when Autonomy bought their larger competitor Verity, part of the rationale for Autonomy being the surviving company was the cost of annual SarBox compliance if they were a US company. HP must be willing to pick up the tab, because I sure can't see Palo Alto moving to Tottenham.

Stay tuned, and let us hear what you think!


Full disclosure: I was an HP employee for 10 years, and during my time there was fortunate enough to be the PC support guy for Bill Hewlett, Dave Packard, and Dave Packard Jr. HP is not the same company as it was when they were running the show. I recommend Mike Malone's "Bill and Dave" as a great explanation of what made HP great for 40 years, and successful for 72 years so far.



August 18, 2011

HP looking at acquiring Autonomy

The consolidation continues. Since the FAST acquisition by Microsoft a few years back, people have asked who we thought might acquire Autonomy; and we always decided they were almost too big to be a target, that Autonomy would be the surviving company.

It seems that HP, which apparently just rolled out its FAST implementation to its public web site, is the surprising answer.

Quick take: in a way, it makes sense in a few ways:

1. Hardware is getting cheaper and cheaper; being the largest vendor in a shrinking market isn't where you really want to be.

2. HP has long been moving to a services delivery company - look at the 2008 acquisition of EDS. If there is a search technology that typically has huge implementation projects with teams of expensive consultants, it's Autonomy.

3. Company have Chief Risk Officers, who handle compliance, have the budget to buy software to keep fellow executives out of trouble with the government. What better prospects to have?

Wow.. I'm still a bit shocked; I feel like I must have overslept and that I'm having a strange dream. But my dog seems pretty sure it's real.

Stay tuned, and let us know what you think!


August 09, 2011

So how many machines does *your* vendor suggest for 100,000,000+ document dataset?

We've been chatting with folks lately about really large data sets.  Clients who have a problem, and vendors who claim they can help.

But a basic question keeps coming up - not licensing - but "how many machines will we need?"  And not everybody can put their data on a public cloud, and private clouds can't always spit out a dozen virtual machines to play with, plus duplicates of that for dev and staging, so not quite as trivial as some folks thing.

The Tier-1 vendors can handle hundreds of millions of dcs, sure, but usually on quite a few machines, plus of course their premium licensing, and some non trivial setup at that point.

And as much as we love Lucene, Solr, Nutch and Hadoop, our tests show you need a fair number of machines if you're going to turn around a half billion docs in less than a week.

And beyond indexing time, once you start doing 3 or 4 facet filters, you also hit another performance knee.

We've got 4 Tier-2 vendors on our "short list" that might be able to reduce machine counts by a factor of 10 or more over the Tier-1 and open source guys.  But we'd love to hear your experiences.

August 06, 2011

Search Patterns, the Movie

OK, not the movie.. but maybe the slide show.

Peter Morville is a well-known author, with several excellent books, including Search Patterns and Ambient Findability. It turns out that Peter has a flickr collection illustrating "search examples patterns, and antipatterns" that is really worth a look. I may be the last person to know oif this interesting collection of illustrations and screen shots, but it's well worth the look for thoe of us who have to explain to others just what this enterprise search thing is about.

Check it out when you have a few minutes to browse...

And let us hear from you!

August 02, 2011

Connecting Google to SharePoint 2010: White Paper

Ba_insight_logo NIE partner BA Insight will soon be releasing a white paper highlighting key differences between SharePoint 2010 search and the Google Search Appliance.

The early draft we've seen of Google & Microsoft Enterprise Search Product Comparison, and can talk about some of the discussion points.

Update: The white paper is now available! Get it now.

Generally, the research paper discusses the following topics:

1. Content: Crawling, indexing, security, and connectors

2. Query processing: Manual and automatic relevance tuning; actionable results; and layout design

3. Vendor 'intangibles': Maintenance, support, vendor stability, licensing and the partner eco-system

BA-Insight is a large Microsoft partner, and their DNA reflects it. But many customers use Google Search Appliances with SharePoint, and this comparison is nicely done. Keep an eye on their site or stay tuned for updates here when the research paper is available.

And let us know what's on your mind with respect to enterprise search - leave a comment!



August 01, 2011

Google Refine, Google's open source ETL tool for data cleansing, with videos!

For any of you working with Entity Extraction this might be of interest.  Google has open sourced some software from their FreeBase acquisition, formerly called Gridworks.  It lets you interactively cleanup and transform data.  More importantly, it says these steps into a reusable sequence of steps in JSON format, so they could be reapplied to other data.

Here's the main page and wiki (and 3 intro videos):

It IS Open Source, here's the source code and license:

That type of UI makes me want to dust off our XPump code and retrofit into it...