« February 2007 | Main | April 2007 »

4 posts from March 2007

March 27, 2007

The Fallacy of Single-Shot Relevancy

One of the problems that has plagued corporate search for so long is the assumption that a user simply needs to enter a query and the search technology will automagically return the best answers.

It isn't really the corporations' fault - the search vendors have been making this pitch for years. And to make things worse, Google and Yahoo and other web search engines make it look so simple. What most users don't realize is that these internet search services have it easy: there are perhaps tens of thousands of sites that cover most subjects, and no one notices if a few thousand documents are missing from the result list. In the corporate world, you only have one page that contains your CEO's bio, and if that page doesn't come back at the top of a search, you know someone is going to be unhappy.

For a while, companies and vendors tried to push "Advanced Search" as the solution. The logic was "if a user really wants the answer, s/he can drill down into the advanced page". Nope. Wrong again. Some of our customers who survey their web site users report that fewer than 3% of all searches come from the advanced search page. Yet a large percentage of users report they are dissatisfied with their search results. Clearly, this is a failed strategy as well.

We need to find a way to engage the user in a conversation to learn what they are really looking for.

What's the solution?

Continue reading "The Fallacy of Single-Shot Relevancy" »

March 22, 2007

The Future (Platform) for Enterprise Search 2.0

I think I saw the future of Enterprise Search 2.0 today – and it’s from – Adobe.

 Yep – Adobe Apollo actually. And it’s not a search engine. It’s a new ‘cross operating system run-time’ that ‘allows you to build desktop applications using web technology' like HTML JavaScript and Flash. Looks to be taking Sun and Java on right where it hurts.

 But wait – the future of search?  Have a look at the video. Now think creatively.

What could you do with a search application that knew what things you were interested in, and could do the searches while you are online? Now imagine it can cache the results for you to read when you are offline. It could check your corporate blog (we are talking Enterprise Search 2.0, right?). You can tag the articles you read offline, enter a few posts, update your stored queries. And when you are back on the ground, five minutes on the wireless and you’re all caught up.

In the office? Searches show up clustered together in folders – Projects, People, Products. Want to drill down on some results? No problem, open an Apollo widget your IT guys created in HTML and drill into the search results. Tag them. Update them. Sort them. Filter them. You need to search in multiple repositories with different logins? No problem - your Apollo application knows how to federate results just the way you want them.

It’s also a great way for your IT guys to cache your important documents offline, so they are still available, even when your laptop is at 31,000 feet headed eastbound. Search handles all the security, so no one who isn’t authorized can see your documents. And since you will have control in the Apollo applications over which of your files get put into the cache, no one will see your quicken files or credit card statements.

 Is this really a search platform? Search is at the heart of most enterprise applications of the future. Not the button you’re used to, but the very platform. Philippe Courtot., formerly of Verity, was fond of saying ‘Search is ubiquitous’. We all thought he meant ‘Every company will have a search box on their web pages'. What he meant is search is everywhere, in every application.

Andrew McAfee of Harvard talks about channels and platforms. Channels are applications like IM, email, text messaging; platforms are the wikis, blogs and collaborative platforms like Lotus Notes or Microsoft SharePoint. Search, too, is a platform on which all of the Enterprise 2.0 tools will be built. Buttonless search, perhaps. Zero click search. But search under the covers nonetheless. And Apollo looks like a pretty cool platform for the platform.

(By the way, i looks like you can download a test of version of the Apollo Run-time and the SDK from Adobe.

March 16, 2007

Python interpreter for testing custom FAST Pipeline stages from the command line

Custom pipeline stages go into esp/lib/python2.3/processors

But esp/bin has no python.exe

Sure, you can download python 2.3.5 from the fine folks at python.org. It'll check for valid syntax, but after that you will get errors like:
Traceback (most recent call last):
File "MyMapper.py", line 28, in ?
from docproc import Processor
ImportError: No module named docproc

It doesn't have the FAST-specific Python libraries it needs to run from the command line. A simple library path issue? Well... I don't see any *docproc* files anywhere.

But more to the point, they DO give you a command line python interpreter, it's called cobra.exe, vs. python.exe

In esp/bin run the batch file setupenv.cmd

Then in esp/lib/python2.3/processors you can run
cobra YourStage.py

March 11, 2007

Search Dial Tone

Since my early days at Verity in 1989, I thought search was a pretty cool thing. Verity was an early success in what we now call 'enterprise search' because they were selling an application that let companies (and government agencies) index and search large volumes of digital content. Of course, in those days 'large content' was tens of thousands of documents. Still, Verity had some very cool capabilities including automated hyperlinking between text documents and synchronized image links (both thanks to Abe Lederman, founder of DeepWeb Technologies. When we at Verity first saw HTML links, they seemed pretty old fashioned. But I digress.

Most search before Verity was pretty basic. You typed in a keyword query and got a list of documents that contained your keyword. Oh sure, there were some technologies that let you define synonyms and other basic functionality, but most of it was pretty simple. (By the way, one of the things Verity had even in those early days was 'topics' - structured taxonomies of concepts.. an early day 'concept search'. If you typed in a query for 'New York', it wasn't uncommon for the character-based user interface to ask "New York the City or New York the State?". Very cool, even by today’s standards.

But now, Google is the public web search that so many use – and praise. But John Battelle, while speaking at FASTForward06 last year, likened the Google search interface to a MS-DOS 2.0 DIR command. In MS-DOS, you type DIR and it shows you a list of files. In Google, and other present-day web search engines, you type a query and you get a list of documents.

We call this kind of search within the company Search Dial-Tone. Think POTS: Plain Old Telephone Service  - you pick up the phone and you get dial tone. But no caller id, no call waiting, no voicemail. Dial the phone and maybe someone will answer. Heck, after a major disaster, you may not even get that. Search Dial-Tone (POST - Plain Old Search Technology - is just like that: You enter a search and you get results – sometimes lots of results. No suggestions. No best bets. No navigators. No entity recognition. No context. No analytics. Your search found 13,276 hits on your web site? Still can’t find what you want? Good luck with that – keep scrolling.

In all fairness, Google and others are starting to show me more - departments when I search for Stanford, FedEx tracking data when I enter a tracking number, even airline flight information when I enter a flight number. Some free and low-cost engines including the IBM OmniFind Yahoo! Edition are starting to improve on this ‘Enterprise Search 1.0’ by providing best bets, synonyms and the like – but most queries are pretty much MS-DOS compatible. You pick up the phone and hear the dial tone, but it sure isn’t fancy.

Jump to today: enter Web 2.0, Search 2.0, and Enterprise 2,0. Hundreds (thousands?) of people write every day about how things will be better in the future. On those few occasions when people write about Search 2.0, they mean that time in the future when Google and others will be much better. How? The general consensus is that they will use context.

Think about it: a query has context: what does the word look like (think FedEx tracking numbers); is it misspelled ("Did you mean...?); are both words capitalized (a name perhaps); what language is the query written in?   

The user has context as well: where does the searcher live? What other terms has this searcher used recently? What documents has he or she looked at?

The data that is indexed has context: what names are common in the documents? Are there terms that are often near other terms? Has the author written other documents that might be interesting?

Google and its competitors are starting to understand all of these types of context, but it isn't easy. Enterprise Search 2.0, inside of companies, already has access to the context that public search engines can only dream about. Think about it: your employer knows who you are: what your job title is; what department (and city/state/country) you work in, and who else works on the same projects you are working on. They know where you went to school; what degrees you have in which fields; and they know what projects and customers you have worked with. They can easily know what searches you have done on your corporate network; they know which documents you looked at. And they know what people like you found helpful. They have your phone number, your email address, and your vacation schedule. And as companies begin to implement Enterprise 2.0 technologies like blogs, wikis, and other lightweight publishing solutions, companies will have access to all of that as well. All they need to do is use it.

Imagine Enterprise Search 2.0 in action. You enter a person's name, and at the top of the results, you see a corporate directory entry for that person, with phone number and email address as a hyperlink. You get a link to the project he is working on. Type in your company ticket symbol, you get the most recent quote. Search for an internal project name and you have a navigator link to all of the current information on that project.

Newer search technologies like FAST ESP, Autonomy IDOL, IBM OmniFind and others are beginning to offer various levels of this newer, smarter search that understands context. We call that Enterprise Search 2.0, and it's the next big thing in enterprise search. And that's what we're all about here at New Idea Engineering.