20 posts categorized "Verity K2"

January 22, 2009

Autonomy proposes to acquire Interwoven

This morning Autonomy announced that they will acquire California-based interwoven for $16.20 a share, a nice premium over last night's close at $11.84. The deal, expected to close Q2-2009, is subject to the vote of shareholders of both companies. This comes after Interwoven announced that they expected to their most recent quarter pretty much met their previous guidance.

It sounds like a great move. Autonomy continues its evolution from a search leader into compliance leader. Interwoven has been on a buying spree itself over the last several years, and offers solutions in many different areas including Digital Asset Management, eDiscovery/Legal, records management and even enterprise search.

Since FAST's acquisition by Microsoft a year ago, we've wondered if Autonomy would be an interesting target for acquisition by someone like Oracle or even by Google; but it looks like Mike Lynch would rather grow the old-fashioned way - by top line growth and by acquiring companies whose products support and extend their business.

June 18, 2008

Search Quality: You Can't Improve What You Don't Measure

In our latest survey of new newsletter subscribers we found that 29% had no formal metrics for measuring quality of search results.  Search metrics allow you to keep search on the right track and can be a powerful tool for managing your systems.  They are a wonderful source for insights and trends.  We thought we would share a couple that we think work well. Many of these are covered in greater depth in Interpreting Your Search Activity Reports in the Enterprise Search newsletter.

  • Count the number of people who use search  
  • Count the total number of searches  
  • Count the number of zero search results  
  • User feedback on top 100 searches  
  • Track email complaints about search  
  • Measure number of clicks on navigators (navigation menu items)  
  • Business Goals  
    • Reduce call volume (normallized for growth in customer base) by enabling self-service from search: results are good enough to reduce calls.
    • Reduce e-mail volume (again adjusted for growth in customer base) by enabling self-service from search: results are good enough to reduce e-mails. 
    • Revenue       
    • Add-on revenue       

May 30, 2008

Some interesting Enterprise search events the week of June 2nd

There are two really interesting events happening next week that might be of interest.

First, Leslie Owens of Forrester is presenting a the Forrester Wave Enterprise Search platform webinar  on Monday morning, June 2 at 8AM. There is a nominal fee, but I think you will find it interesting.

Then, Leslie and several other interesting speakers will be at a free one day seminar hosted by FAST on Wednesday the 4th in Redwood Shores California at the Sofitel Hotel. In addition to Leslie Owens' presentation on 'Technology Populism', speakers will include Jeff Spataro of Microsoft; Hadley Reynolds of FAST; and senior IT managers from Cisco and National Instruments.  Hadley, by the way, speaks and writes on Search Centers of Excellence and other innovations in the application of enterprise search. Be sure to register for the free FAST Search event.

May 08, 2008

A proposed standard for enterprise search

Dieselpoint has announced support for a technology it calls OpenPipeline, which can enhance the task virtually every enterprise search technology uses to get documents into the search index. They will be showing the pipeline at the upcoming Enterprise Search Summit on May 20-21 integrated with their new Dieselpoint Search 4.0, also on display.

The Dieselpoint press release claims:

OpenPipeline provides a common architecture for connectors to data sources, file filters, text analyzers and modules to distribute documents across a network. It is fully functional out of the box and includes an installer, a job scheduler, file scanner and crawlers, doc filters, and point and click interface with drag and drop module installation.

OpenPipeline is compatible with IBM's UIMA (Unstructured Information Management Architecture), and is designed to connect UIMA annotators to other systems.

Document processing can be centralized or parallelized as needed. The transport mechanism is simple, web-services XML over HTTP. RSS/Atom feeds are also possible.

The development philosophy behind OpenPipeline stresses simple, elegant design, and massive scalability. Minimal external dependencies and straightforward plug-in implementation ensure that the learning curve is low.

OpenPipeline can be downloaded without charge from http://www.OpenPipeline.org. It's available under the Apache License.

Making this technology open source makes sense. The core technology for an enterprise search company, their 'secret sauce', is optimizing the index and making search great, not creating new code to parse the latest version of Microsoft Office or of Documentum. By embracing OpenPipeline, presumably we will start to see pipeline stages created by a number of smaller companies and individuals, easing the burden on enterprise search companies. And companies that provide possible sources of data like Content Management Systems, can create a single pipeline stage for their product that could work for every search technology, and be done with it.

To create a searchable index, all search technologies need to create a stream of text. If the source document is a binary file - Microsoft Word, for example - search vendors need to provide some way to read the format and convert it to text. The same is true of content stored in a relational database: each row represents a virtual document which needs to be extracted from the database and turned into a stream of text. This conversion is typically done as one stage of a pipeline. Other stages may include adding metadata, performing entity or sentiment extraction, or even enhanced language processing.

The concept of a 'pipeline' applies directly to many existing search technologies, each with a proprietary method of accessing content. On top of that, no search technology companies have cooperated with competitors to create standards. In the relational database world, standards have made life much better: consider ODBC and JDBC. Because of these standards, developers can write code that can connect to just about any relational database. Not so in search. Maybe this effort will help break the ice. Stay tuned...

As enterprise search users, are you glad to see an open source solution for part of the search puzzle?

May 05, 2008

The problem with alerts - Google or otherwise

I use Google alerts to keep an eye on current events. Over the weekend I got an alert: "AMEC uses Verity's K2" - Now, since Verity is part of former competitor Autonomy, and because K2 is generally not being actively marketed, I decided to read the article. Sure enough, the content is dated January 2004, but Google Alerts thinks it is brand new. So I have to conclude that either the publisher just changed something on the page, or Google is just finding that document - either way, Google thinks this is news and in reality, it isn't.

Not long after we started SearchButton.com, we met the Google founders Sergey and Larry. Mark Bennett, my co-founder at SearchButton and here at New Idea Engineering, asked about the then-young Google's handling of dates and recency, and the Google guys took the position that date wasn't that important. This has led to a couple of energetic email exchanges over the last few years, but my recent alert illustrates the problem Google - and most other search technologies have - in generating really useful alerts. In fact, this subject was of such relevance to enterprise search owners, we had an article about the importance of dates in the first issue of our enterprise search newsletter in April of 2003.

Continue reading "The problem with alerts - Google or otherwise" »

January 10, 2008

Updated 2008 Enterprise Search Vendor Roundup

Jan. 10, 2008 - San Jose, CA, USA 

Microsoft announced they were acquiring FAST Search on January 8, forcing New Idea Engineering to amend our January 4th article "2008 Enterprise Search Vendors:  The new 'Fab4 ... and 1/2" (http://www.ideaeng.com/pub/entsrch/2008/number_01/article01.html). The announcement validates our original assessment and reinforces that search is mission critical for corporations, driving Microsoft to invest in a better search technology.

Some Highlights from NIE's 2008 Enterprise Search Vendor Roundup
Autonomy IDOL and FAST Search continue to hold the high end. K2 and Ultraseek are finally retiring.
Google's new version 5 appliance has arrived in the enterprise search mainstream.
Endeca is moving from the ecommerce side and had one of the most impressive search demos at ESS West 2007.
Lucene/ Nutch/ Solr (LNS) open source search engines continue to gain customer mindshare.
Microsoft with its acquistion moves in as Tier 1.
IBM and Oracle still not there.
Autonomy IDOL and FAST Search continue to hold the high end, evolving into "search platforms" that go beyond traditional drop in applications. The two leaders from earlier this decade, K2 and Ultraseek, are fading.

Google's new version 5 appliance has arrived in the enterprise search mainstream. While the new version won't satisfy every requirement, it addresses many of the earlier integration issues that had held it back. Expect to see the Google logo on a lot more enterprise portals.

Endeca has created some slick administration tools, doing very well in a head-to-head comparison with Autonomy and FAST despite their continued progress in this area.  As the importance of administration continues to increase, we are more enthusiastic about them in the Enterprise space.

Open source tools based on Lucene, including Nutch and Solr (LNS) are increasingly considered by companies, especially in niches that need to micromanage document relevancy and rating. Lucene and its derivatives are increasingly embedded in other software packages and services, to the point that many users won't even realize they're using it.

We had expected IBM to be the next entrant into the "Tier 1" lineup, based on their iPhrase acquisition. To our surprise, when we saw IBM at ESS East 2007, they were featuring one of their older engines, the OmniFind Enterprise Edition. IBM OmniFind is still not one of our new Fab 4 and an 1/2.

Dieselpoint, Intellisearch, Reccomind, ISYS, ZyLAN, Vivisimo, Siderean and Exalead have strong presences in niche markets.
To read the full article ... 2008 Enterprise Search Vendors: The New Fab 4 ... and 1/2. http://www.ideaeng.com/pub/entsrch/2008/number_01/article01.html

July 05, 2007

Convergence of Enterprise Search and BI?

Today Business Objects announced it has completed the acquisition of inxight, the developer of enterprise search tools for linguistics and text analysis.  inxight products were used in mainstream engines like K2 from Verity before a falling out over competing products and technology a couple of years back. Verity, of course, has been part of Autonomy for well over a year now - talk about consolidation!

This seems to be part of a continuing trend as enterprise search vendors discover BI as a potential market; and BI players like Business Objects discover the importance of search technology in great BI. Just a year ago, FAST Search and Transfer acquired Corporate Radar which it now markets as FAST Radar.

The real question is this: How well do BI and Enterprise Search really integrate? Andy Hayler of Kalido  asked this insightful question (no pun intended) earlier this year, and as yet I don't think anyone really has an answer. What do you think?

March 27, 2007

The Fallacy of Single-Shot Relevancy

One of the problems that has plagued corporate search for so long is the assumption that a user simply needs to enter a query and the search technology will automagically return the best answers.

It isn't really the corporations' fault - the search vendors have been making this pitch for years. And to make things worse, Google and Yahoo and other web search engines make it look so simple. What most users don't realize is that these internet search services have it easy: there are perhaps tens of thousands of sites that cover most subjects, and no one notices if a few thousand documents are missing from the result list. In the corporate world, you only have one page that contains your CEO's bio, and if that page doesn't come back at the top of a search, you know someone is going to be unhappy.

For a while, companies and vendors tried to push "Advanced Search" as the solution. The logic was "if a user really wants the answer, s/he can drill down into the advanced page". Nope. Wrong again. Some of our customers who survey their web site users report that fewer than 3% of all searches come from the advanced search page. Yet a large percentage of users report they are dissatisfied with their search results. Clearly, this is a failed strategy as well.

We need to find a way to engage the user in a conversation to learn what they are really looking for.

What's the solution?

Continue reading "The Fallacy of Single-Shot Relevancy" »

March 11, 2007

Search Dial Tone

Since my early days at Verity in 1989, I thought search was a pretty cool thing. Verity was an early success in what we now call 'enterprise search' because they were selling an application that let companies (and government agencies) index and search large volumes of digital content. Of course, in those days 'large content' was tens of thousands of documents. Still, Verity had some very cool capabilities including automated hyperlinking between text documents and synchronized image links (both thanks to Abe Lederman, founder of DeepWeb Technologies. When we at Verity first saw HTML links, they seemed pretty old fashioned. But I digress.

Most search before Verity was pretty basic. You typed in a keyword query and got a list of documents that contained your keyword. Oh sure, there were some technologies that let you define synonyms and other basic functionality, but most of it was pretty simple. (By the way, one of the things Verity had even in those early days was 'topics' - structured taxonomies of concepts.. an early day 'concept search'. If you typed in a query for 'New York', it wasn't uncommon for the character-based user interface to ask "New York the City or New York the State?". Very cool, even by today’s standards.

But now, Google is the public web search that so many use – and praise. But John Battelle, while speaking at FASTForward06 last year, likened the Google search interface to a MS-DOS 2.0 DIR command. In MS-DOS, you type DIR and it shows you a list of files. In Google, and other present-day web search engines, you type a query and you get a list of documents.

We call this kind of search within the company Search Dial-Tone. Think POTS: Plain Old Telephone Service  - you pick up the phone and you get dial tone. But no caller id, no call waiting, no voicemail. Dial the phone and maybe someone will answer. Heck, after a major disaster, you may not even get that. Search Dial-Tone (POST - Plain Old Search Technology - is just like that: You enter a search and you get results – sometimes lots of results. No suggestions. No best bets. No navigators. No entity recognition. No context. No analytics. Your search found 13,276 hits on your web site? Still can’t find what you want? Good luck with that – keep scrolling.

In all fairness, Google and others are starting to show me more - departments when I search for Stanford, FedEx tracking data when I enter a tracking number, even airline flight information when I enter a flight number. Some free and low-cost engines including the IBM OmniFind Yahoo! Edition are starting to improve on this ‘Enterprise Search 1.0’ by providing best bets, synonyms and the like – but most queries are pretty much MS-DOS compatible. You pick up the phone and hear the dial tone, but it sure isn’t fancy.

Jump to today: enter Web 2.0, Search 2.0, and Enterprise 2,0. Hundreds (thousands?) of people write every day about how things will be better in the future. On those few occasions when people write about Search 2.0, they mean that time in the future when Google and others will be much better. How? The general consensus is that they will use context.

Think about it: a query has context: what does the word look like (think FedEx tracking numbers); is it misspelled ("Did you mean...?); are both words capitalized (a name perhaps); what language is the query written in?   

The user has context as well: where does the searcher live? What other terms has this searcher used recently? What documents has he or she looked at?

The data that is indexed has context: what names are common in the documents? Are there terms that are often near other terms? Has the author written other documents that might be interesting?

Google and its competitors are starting to understand all of these types of context, but it isn't easy. Enterprise Search 2.0, inside of companies, already has access to the context that public search engines can only dream about. Think about it: your employer knows who you are: what your job title is; what department (and city/state/country) you work in, and who else works on the same projects you are working on. They know where you went to school; what degrees you have in which fields; and they know what projects and customers you have worked with. They can easily know what searches you have done on your corporate network; they know which documents you looked at. And they know what people like you found helpful. They have your phone number, your email address, and your vacation schedule. And as companies begin to implement Enterprise 2.0 technologies like blogs, wikis, and other lightweight publishing solutions, companies will have access to all of that as well. All they need to do is use it.

Imagine Enterprise Search 2.0 in action. You enter a person's name, and at the top of the results, you see a corporate directory entry for that person, with phone number and email address as a hyperlink. You get a link to the project he is working on. Type in your company ticket symbol, you get the most recent quote. Search for an internal project name and you have a navigator link to all of the current information on that project.

Newer search technologies like FAST ESP, Autonomy IDOL, IBM OmniFind and others are beginning to offer various levels of this newer, smarter search that understands context. We call that Enterprise Search 2.0, and it's the next big thing in enterprise search. And that's what we're all about here at New Idea Engineering.

February 01, 2007


Miles Kehoe, New Idea Engineering, Inc.

Welcome to the Enterprise Search blog! Mark Bennett and I will be the initial posters here because it lets us do something we love to do - talk about enterprise search. Our postings will follow the technical and business issues in and around the enterprise search marketplace - products and services from companies like Fast Search and Transfer, Autonomy, IBM OmniFind and the IBM Yahoo! Edition, the Google Search Appliance, Lucene and many  others.

Enterprise search is no longer just a search box on a web page - it is a platform on which companies are building advanced applications. Search is the driving factor, and making search work is critical in keeping customers and employees satisfied, in meeting RIO objectives, and even in meeting compliance and regulatory requirements.

Continue reading "Welcome" »