« May 2009 | Main | July 2009 »

8 posts from June 2009

June 17, 2009

Open Source for Enterprise Search: Webinar

I just learned from David Fishman of Lucid Imagination that KMWorld Magazine and Lucid are presenting a webinar Tuesday, June 23rd. In addition to KMWorld publisher Andy Moore, the webinar will feature Sue Feldman of IDC and Ranga Muvavarirwa and Tom Morton of Comcast Interactive Media.

The KMWorld mailer says:

Business and government are using enterprise search to lower barriers to information for search, retrieval, analysis and exploration. Now Open Source is lowering the barriers to search, accelerating time-to-value from rapidly growing, diversifying organizational document and data resources.

Register for the free webinar at http://www.kmworld.com/webinars/lucid/23jun2009/kmb/. You may need to register with KMWorld or log in if you've already attended any of their webinars.

We've blogged about Lucid and open source before,and the need for more enterprise packaging before open source really becomes an enterprise solution. We'll be excited to hear what the folks from Comcast have to say.


Keyword Search: Much Ado About Nothing

Recently, Mark Logic's CEO Dave Kellogg wrote Things Not To Do: Declare Your Category Dead, sage advice with a personal story harking back to his days at Ingres. Dave's story is in response to a Larry Hawes post in the Gilbane Group Blog about the recent conference they held in San Francisco. According to Larry, Microsoft's Jeff Fried claimed 'keyword search is dead'.

My first thought was 'of course it is'. There have been reports of the death of keyword search - even of enterprise search itself - for years. Those of us who work with search have long realized that the idea of finding what you want from millions of documents by entering a word or two is totally unrealistic. Modern search technologies provide the tools to engage users in a conversation, to lead them to the best result iteratively using facets, natural language search, entity extraction, and sentiment analysis.

Next, since I actually attended the presentations Jeff gave, I thought back to see if I could remember him making the claim, but sure didn't remember it. I checked with Carl Grimm, who was there with me, and he didn't recall hearing it either. So I decided to check with Jeff, who even went back and checked his presentation to see if he made the claim anywhere. As we talked, we both agreed about the value of other technologies beyond simple keyword search; but he's pretty sure he never made the claim that keyword search is dead.

We can say with some confidence, the reports of the reports of the death of keyword search are greatly exaggerated.

June 12, 2009

New SearchDev groups for Autonomy and FAST

Years ago, Verity had a very popular and successful user group meeting every year, but Autonomy has not seemed to be interested in continuing the event.  SearchDev grew out of that last Verity user group meeting, and has a very active membership looking for, and providing advice to, those developing search applications within their respective companies.

As the group has grown, we're seeing an interest in more specialization by search technology. For that reason, we're happy to announce creation of two new SearchDev groups: autonomy.searchdev.org and fast.searchdev.org. These, along with the original www.searchdev.org, will continue to provide technical assistance for search developers, and should allow us to drill down a bit in terms of specialization. Expect other groups in the near future for commercial and open source engines.

Since these communities will have a more narrow focus, the feeling is that the group can grow to include webinars and perhaps even regional user group meetings in the future.

People who are interested in participating can join the Yahoo-group-based forum and start working to make these new user group a success.


June 09, 2009

Enterprise search doesn't mean mortgaging the farm

Lynda Moulton, the Search Practice analyst at CMS firm Gilbane Group, really hit the mark on a recent blog post nominally about how advertising money can but editorial space. While it's true that many  publications (and analyst firms) are happy getting paid by both sides. (Note: I was called out on this by this by Theresa Regli of CMS Watch, so I no longer say 'all analysts' for anything!)

In my opinion, the real news in Lynda's post is this: "there are dozens of enterprise search solutions that will serve you extremely well, with much lower cost of ownership" than with the big industry players. In fact, open source is beginning to penetrate the corporate veil, and while Lucene and Solr are not right for everyone, it looks like they've just about implemented what Mark and I consider Verity's  "Topic 1.0" capabilities circa 1990. We went to a meet-up the other night that Mark has written about; and we were pretty impressed.

So before you decide you need to budget a half million dollars or more for search, consider what Walter Underwood, chief architect of Ultraseek and now search evangelist at Netflix once told me. Paraphrasing: "You can download Solr then spend a ton of money customizing it; or you can spend a ton of money licensing enterprise search software, then spend a ton of money installing and customizing it. Your call."

But get help!


June 08, 2009

Enterprise Search Engine Optimization: eSEO

Last week at the Gilbane Conference in San Francisco, I participated in a panel "Search Survival Guide: Delivering Great Results" moderated by Hadley Reynolds of IDC. In the presentation, I offered a new view on improving enterprise search engine relevancy that I call eSEO.

The term SEO is well understood by - and widely practiced in - the corporate world.  The concept of SEO, as summarized by one of the Gilbane talks, states that "Key to the value of any Web content is the ability for people to find it”. In the SEO world this is done by combining organic results and keyword placement - advertising - to improve placement, maintain ranking, and monitor search engine position - results- over time.

While we've been helping our customers improve their enterprise search results, it's hard to convince them that search results are not a problem they can solve once. I've decided to apply a new term to this process - Enterprise Search Engine Optimization, or eSEO. To paraphrase the role of SEO, eSEO is the process of combining organic results and best bets to deliver correct, relevant, timely content to enterprise search users - employees, customers, partners, investors, and others.

For both organic and best bets, the first step is to identify what we call the "top 100" queries. Start by creating a histogram that shows the top terms from your search engine. I hope you'll agree that if the top queries - whether 100, 50, or even 20 - deliver great results, you're on your way to having happy users. Talk to your content owners as you review the histogram, and ask them to identify the best result for each.

Once you have a list of queries and results, start the two step process: tune the search engine using its native query tuning capabilities. This will impact the shape of the histogram, and over time should start delivering better results. The bad news is tuning like this doesn't position all of your top terms, and it would be silly to try to micro-manage the results for each. That's why search engines have best bets.

When you feel pretty good about the curve through query tuning, it' time to start setting up best bets - the "ad words" of eSEO. Limit the number of bests bets to one or two at most - but remember that you can use other real-estate like the rightmost column of the screen to suggest additional content. Some guidelines for best bets:

  • Use one or at most two best bets
  • Don't repeat a document already at the top of the organic results
  • Make sure your best bets respect security

Once you have tuned your search engine, and set up best bets for the most timely and actionable result, you're ready to roll it out. But then the ongoing part comes in: you need to review your search activity and best bets periodically. Usually, we'd suggest once a month for a while, then perhaps quarterly thereafter. You may find seasonal variations, and if you're not watching you'll miss a golden opportunity.

In Summary

1. eSEO is just as critical as SEO

  • Lost time and revenue
  • Legal exposure

2. Watch for trends over time: Search is not "fire and forget"

3. Make sure SEO doesn't impact your eSEO

  • Use fielded data that web search engines ignore for your tuning (i.e., 'Abstract' rather than 'Description'.

This will get you started; but because your queries and your content changes over time, it's a never-ending story. Some companies - ours included - have tools that can help. But no matter what, hang in there!


June 06, 2009

Impressions of first Lucene/Solr SF Meetup

Kudos to Carl, our NIE Marketeer and defacto social director, for getting us to attend, well worth it, and conveniently coinciding with Gilbane.

The Good:

  • VERY entertaining, very informative.  Lots of good info about upcoming versions of Lucene and Solr, including additional performance tweaks.
  • A friendly, supportive bunch of like-minded nerds, and I mean this is the best possible way.
  • Also discussions of other related Apache projects.  We're all gonna need a cheat sheet pretty soon to keep track of it all.
  • Lucene/Solr will soon have implemented much of the core features of Autonomy IDOL, Endeca, FAST, etc.  They really ought to be spying.  :-)

Personally I think Otis & co. might wanna fly out for the next one.  I also think Dieselpoint ought to attend and talk about Open Pipeline.  If we get up enough energy maybe we could even volunteer to do that next time, we're on the board after all, but this is really Chris's baby.

The Not-so-Good:

  • About 50 terms that clients would not understand.  Don't get me wrong, we love the Map/Reduce, Bayesian, K-Means, SVD stuff, but most corporate clients would be lost.
  • Not much for Enterprise Packaging.  Ironically it's the mundane aspects of search, from a non-developer standpoint, that are still not on the horizon.  Not a criticism of the developers, they have what they need.
  • Not much about Nutch.  Nutch 1.0 is out, along with rumors of a revised admin GUI, but not much coverage here.

Impressions of Lucid Imagination:

This event was sponsored by Lucid, a company that recently got funding for bringing commercial packaging and services to the open source search world, and their senior staff includes quite a few of the core committers.

  • A very sincere bunch of guys.
  • They haven't sold their souls to corporate America, I think their "geek cred" is still well in tact.
  • Probably will not be filling in enterprise packaging pot holes any time soon.
  • Do they understand the Enterprise Market?

Also a shout out to LinkedIn and IBM for giving back to open source community.

There was also an "open mic" segment, and I'd like to give a shout to Avi Rappaport - I agree 1,000%, "stop words bad!" (or at least the blind use of index time stop words)


  • Not much of a threat to Google Appliance, due to packaging.  Yes, Google scales with their Map/Reduce and relevancy algorithms, and the open source guys have responded, but that's not the stuff that makes Google tick these days.
  • And despite the impressive and rapidly evolving core technologies, also not a real threat to the other Tier One vendors like FAST and Autonomy.  More on this seeming contradiction in a bit.
  • The Tier 2 vendors of the world, Attivio, Exalead, Dieselpoint, etc. DO need to pay attention.  There is a place for Tier 2 vendors, but they need to mind what the open source products do and do not provide more carefully.
  • It's really cool to see IBM willing to contribute so aggressively to the open source search engines, even though they sell several of their own.  A naive person might think they are competing with themselves, sabotaging their own sales guys, but they're a lot smarter than that.  They are selling their commercial search products as pure search, those technologies are always part of a larger (and more expensive) grand business solution.  They know what they're doing!

For similar reasons, still not a huge threat to Autonomy, MS/FAST, Endeca, etc. on corporate services.  I said earlier that the Apache projects are implementing a lot of the "secret sauce" that launched Autonomy and Endeca, etc, so you'd think this represents "a clear and present danger", but Mike Lynch's secret algorithms are not why people buy IDOL anymore.  Things like giant reference accounts, professional services, and commercial grade spiders have a lot more to with why big companies still pay six figures for search technology.

And speaking of surprises and Lucid Imagination, I wanna circle back to their PR a few months back when they got their funding and launched their company.  They talked about relevancy in their press releases!?  Wow... Yes, Lucene and Solr have some good traction there, but that specific competitive advantage has been used by almost every commercial search vendor in the past 15 years, including Verity, Autonomy and Google!

I would've expected them to say something like "we're gonna do for Lucene what RedHat did for Linux" - this would have been a very clear business-oriented proposition, though to be fair lots of companies have used that business model as well.  It wouldn't be original, but would be more business centric.  Then again, I'm not in Marketing, and their VC's obviously liked their pitch, so what do I know!


June 03, 2009

Bing's Here

So Microsoft's big news at last week's All Things Digital conference in San Diego was the public unveiling of Bing. Some people are unimpressed - notably Endeca's Daniel Tunkelang and well known search expert Steve Arnold, who thinks search is beyond sick.

Nonetheless, in a few of the verticals Microsoft says they want to start with - among them health, travel, and retail - they have actually started adding some useful 'enterprise-like- capabilities.

if you search Bing for 'San Diego', you get a typical result list in the center of the screen. Note that even this 'conventional' result list shows results within 'categories' - the only one you can see in the attached image is San Diego Attractions, but if you try this yourself., you'll see results that look remarkably like federated results - for San Diego Weather, Hotels, and maps. That's perhaps more meaningful than a list of 267 million results, one page at a time.

Bing_san_diego You'll also note on the left side you have what is essentially faceted results along with search history and related searched. On the right, notice the pop-up that summarizes the document to its left based on a mouse-over. And the links in that pop-up - Contacts, Appeal a Parking Citation, etc - actually apply to doing these tasks described on the page. Bing has pulled these, and has left others unlinked, apparently based on an algorithm - but that's still pretty cool.

Contrast that result list to the Google result this morning for 'San Diego': lots of useful information,Google_san_diego but it doesn't have the nice capabilities that Bing has introduced here.

Overall I think Bing shows a reasonable effort to making internet search more usable, but following the mantra we've endorsed for years: search is a conversation, not a destination. Bing let's you interact and move towards the result you want. And it's an indication that enterprise search from Microsoft will begin to offer these same helpful capabilities, which will make life easier for those who spend their time on the corporate intranet.


June 01, 2009

Gilbane San Francisco

By Miles Kehoe

It's been a busy month of conferences and travels. This week, home again in time to attend and speak at the Gilbane show at the Westin Hotel in San Francisco.

Billed as "Where Content Management Meets Social Media" this year, the conference seems heavily invested in the popular and trendy topic. Gilbane, when search became hot again, began a 'search practice' headed up by Lynda Moulton, who will apparently not be in attendance this year. In her place, the well known and well-respected Hadley Reynolds, now of IDC, will be running much of the search track. I'm joining Hadley on a panel called "Survival Guide: Delivering Great Results", Thursday morning at 9:40AM.

I'll be at the show most of the day Wednesday and Thursday, so if you're there give me a call...


s/ Miles