October 08, 2008

Gartner Magic Quadrant 2008 Now Available

If you have not seen it, the new Gartner Magic Quadrant for Information Access - their name for intranet and customer facing search - has been published and is available for viewing on the Gartner web site thanks to a pointer from Microsoft's Analyst Relations page.

The big story, one which must have them fuming in England, is that Autonomy has dropped down a bit, and the combined Microsoft-FAST offerings have moved up a bit. This puts Autonomy a bit higher up on the 'Completeness of Vision' scale - by a few pixels - but a decent quarter-inch below Microsoft on the 'Ability to Execute' scale. Endeca, IBM, ZyLAB and Vivisimo squeaked into the upper right quadrant, while Google moved right to the link splitting the 'Challengers' from the 'Leaders', but ever so close - one could say the Google dot is on the line. It's odd that Google is not higher on the 'Ability to Execute' scale, since that usually means how well funded the company is. Perhaps they are looking at the budget/sales for only the Google appliance; but even then, Steve Arnold's numbers put them above the others on the scale.

Some excellent search products fell off the list this year, as Gartner has changed their methodology. The products we feel still qualify for the report include Dieselpoint, SLI Systems, and X1 Technologies, as well as newcomer Attivio. The article has more details. And as the con artist Fagan said in the play base don Dicken's Oliver Twist, '...if you happen to pass the Tower of London, have a look at the Crown Jewels'.

September 29, 2008

The Future of Search is Simpler

Gary Szukalski, VP of Field Marketing at Autonomy,  gave a keynote speech at ESS West 2008 entitled "Meaning Based Computing" notable for its lack of vision. He was a replacement for Stouffer Egan, Autonomy's CEO, whose slides he likely used. The talk detailed the difficulties inherent in enterprise search, especially how challenging it can be to understand the full meaning of a word like "dog" or "shred" without context. We live in this world, we understand the difficulty.  While his talk outlined the direction of automatic categorization, alerts, profiles, dynamic and real-time clustering and schemas, it left me wanting real vision and a roadmap from the industry leader. He never rose above the complexity of information processing. 

In a sharp contrast, Google provided more vision in their 10-minute lunch pitch than Autonomy's same old key note speech. Google provided a clear vision: you can be up and operational in one day and search everything.  Simple to use and simple to administer.  In 2008, Google is not the naive implementation that we saw 6 years ago: they have made real progress toward their vision. While other companies are marching to that vision. Dieselpoint's OpenPipeline, Endeca's simple administrative controls, Fast's navigators, Autonomy's categorization, Google is providing the vision. The future is simpler and usable by everyone on the enterprise. We have along way to go - but we can change the business world.  We are moving closer to the vision of many sources of data providing insight and increasing the pace of business decisions.

July 24, 2008

Our Top 3 Google Search Appliance Tips

Many of the operators available on the public Google site are useful within the Google Search Appliance. Here are a few of the most interesting ones.

1: The tilde prefix (~) is the Thesaurus / Synonym operator:

Instead of searching for
    error

Try searching for:
    ~error

A memory mnemonic, remember that in math the ~ is often used as "approximately equal to" symbol.

2: Dot dot (..) does a range search:

You can do:
    47..49

Or even use it for search years (though not full dates):
    2000..2005

3: And who could forget site: operator, useful for double checking your own spider's indexing of your public site:

For our site as of this writing, Google shows 44 docs that mention 'microsoft':
    site:ideaeng.com microsoft

When you search for 'microsoft' from our home page, you get  gives 48 docs.

Try this on your own site - if the number in your search engine is lower than the Google count, your search is missing something!

There are of course a whole bunch more Google operators, and on the other Google Web Search Help Center, but  some of these tips came from MakeUseOf

 

June 18, 2008

Search Quality: You Can't Improve What You Don't Measure

In our latest survey of new newsletter subscribers we found that 29% had no formal metrics for measuring quality of search results.  Search metrics allow you to keep search on the right track and can be a powerful tool for managing your systems.  They are a wonderful source for insights and trends.  We thought we would share a couple that we think work well. Many of these are covered in greater depth in Interpreting Your Search Activity Reports in the Enterprise Search newsletter.

  • Count the number of people who use search  
  • Count the total number of searches  
  • Count the number of zero search results  
  • User feedback on top 100 searches  
  • Track email complaints about search  
  • Measure number of clicks on navigators (navigation menu items)  
  • Business Goals  
  •    
    • Reduce call volume (normallized for growth in customer base) by enabling self-service from search: results are good enough to reduce calls.
    • Reduce e-mail volume (again adjusted for growth in customer base) by enabling self-service from search: results are good enough to reduce e-mails. 
    • Revenue       
    • Add-on revenue       

May 30, 2008

Some interesting Enterprise search events the week of June 2nd

There are two really interesting events happening next week that might be of interest.

First, Leslie Owens of Forrester is presenting a the Forrester Wave Enterprise Search platform webinar  on Monday morning, June 2 at 8AM. There is a nominal fee, but I think you will find it interesting.

Then, Leslie and several other interesting speakers will be at a free one day seminar hosted by FAST on Wednesday the 4th in Redwood Shores California at the Sofitel Hotel. In addition to Leslie Owens' presentation on 'Technology Populism', speakers will include Jeff Spataro of Microsoft; Hadley Reynolds of FAST; and senior IT managers from Cisco and National Instruments.  Hadley, by the way, speaks and writes on Search Centers of Excellence and other innovations in the application of enterprise search. Be sure to register for the free FAST Search event.

May 08, 2008

A proposed standard for enterprise search

Dieselpoint has announced support for a technology it calls OpenPipeline, which can enhance the task virtually every enterprise search technology uses to get documents into the search index. They will be showing the pipeline at the upcoming Enterprise Search Summit on May 20-21 integrated with their new Dieselpoint Search 4.0, also on display.

The Dieselpoint press release claims:

OpenPipeline provides a common architecture for connectors to data sources, file filters, text analyzers and modules to distribute documents across a network. It is fully functional out of the box and includes an installer, a job scheduler, file scanner and crawlers, doc filters, and point and click interface with drag and drop module installation.

OpenPipeline is compatible with IBM's UIMA (Unstructured Information Management Architecture), and is designed to connect UIMA annotators to other systems.

Document processing can be centralized or parallelized as needed. The transport mechanism is simple, web-services XML over HTTP. RSS/Atom feeds are also possible.

The development philosophy behind OpenPipeline stresses simple, elegant design, and massive scalability. Minimal external dependencies and straightforward plug-in implementation ensure that the learning curve is low.

OpenPipeline can be downloaded without charge from http://www.OpenPipeline.org. It's available under the Apache License.


Making this technology open source makes sense. The core technology for an enterprise search company, their 'secret sauce', is optimizing the index and making search great, not creating new code to parse the latest version of Microsoft Office or of Documentum. By embracing OpenPipeline, presumably we will start to see pipeline stages created by a number of smaller companies and individuals, easing the burden on enterprise search companies. And companies that provide possible sources of data like Content Management Systems, can create a single pipeline stage for their product that could work for every search technology, and be done with it.

To create a searchable index, all search technologies need to create a stream of text. If the source document is a binary file - Microsoft Word, for example - search vendors need to provide some way to read the format and convert it to text. The same is true of content stored in a relational database: each row represents a virtual document which needs to be extracted from the database and turned into a stream of text. This conversion is typically done as one stage of a pipeline. Other stages may include adding metadata, performing entity or sentiment extraction, or even enhanced language processing.

The concept of a 'pipeline' applies directly to many existing search technologies, each with a proprietary method of accessing content. On top of that, no search technology companies have cooperated with competitors to create standards. In the relational database world, standards have made life much better: consider ODBC and JDBC. Because of these standards, developers can write code that can connect to just about any relational database. Not so in search. Maybe this effort will help break the ice. Stay tuned...

As enterprise search users, are you glad to see an open source solution for part of the search puzzle?

May 05, 2008

The problem with alerts - Google or otherwise

I use Google alerts to keep an eye on current events. Over the weekend I got an alert: "AMEC uses Verity's K2" - Now, since Verity is part of former competitor Autonomy, and because K2 is generally not being actively marketed, I decided to read the article. Sure enough, the content is dated January 2004, but Google Alerts thinks it is brand new. So I have to conclude that either the publisher just changed something on the page, or Google is just finding that document - either way, Google thinks this is news and in reality, it isn't.

Not long after we started SearchButton.com, we met the Google founders Sergey and Larry. Mark Bennett, my co-founder at SearchButton and here at New Idea Engineering, asked about the then-young Google's handling of dates and recency, and the Google guys took the position that date wasn't that important. This has led to a couple of energetic email exchanges over the last few years, but my recent alert illustrates the problem Google - and most other search technologies have - in generating really useful alerts. In fact, this subject was of such relevance to enterprise search owners, we had an article about the importance of dates in the first issue of our enterprise search newsletter in April of 2003.

Continue reading "The problem with alerts - Google or otherwise" »

March 03, 2008

Deep Web proposes federation resource site

Sol Ledeman of Deep Web Technologies wants to create a one-stop demo center for federation technology and has invited all of the major vendors to participate.

Federated search is becoming increasingly popular as more corporate customers are looking for ways to delivery results from multiple enterprise search installations, often from many different vendors. Sometimes the issue is technical, sometimes political, but nearly all companies have three or more search vendor technologies running somewhere behind the firewall.

The one thing we'd like to have seen in Sol's challenge is security, since that's what we think separates the winners from the also-rans in federation. It's not always easy, but it is 'real world' in companies. Nonetheless, a demo site where users can compare vendor solutions 'apples to apples' on the same data sources would be nice.

By the way, we've seen some confusion among our customers and prospects on the subject, so we've taken a shot at defining 'federated search' in our Enterprise Search newsletter. We hope that helps some.

January 24, 2008

Google Search Appliance (GSA) User Interface "Glitch"

Google's web based administration application is nice and clean, what we've all come to expect from Google.  It reminds us of the easy-to-use Ultraseek web UI.

But one detail that might confuse new admins: many successful actions redisplay the exact same screen.

For example:

  • Edit the properties for a collection, for example by adding another URL.
  • Click the Save Collection Definition at the bottom of the screen.
  • ... and poof! ... you're still looking at the exact same screen.

This might make you wonder if you actually submitted the form.  "Did I actually click the button?" - "Let me try again..."

If you had sharp eyes you'd notice that the browser DID to a quick screen update, and the little activity animation in the upper right corner did flash for a second.

So what happened?  Well... it worked... it did exactly what you asked, it saved the changes; and in case you might want to make additional changes, it redisplayed the same config screen.  Since it worked fine (this is a Google product after all!), there was no error, so no reason to give any sort of alert (in their opinion).  On some other screens, such as the Create Collection form, you'll notice a slight change in the screen, when you're newly created collection is listed in the table of collections.

I've seen this style of UI before, where success equals redisplay without error.  We even debated this back at SearchButton.

Since this is unlike what Windows applications do, which is where a majority of today's computer users cut their teeth, I would argue that it is "non standard" behavior, and tends to be confusing.  I freely admit that logically it makes perfect sense, I get the design philosophy, it's just that it's not what many folks expect.

A simple compromise I'd like to see Google take, which other similar UI's have adopted, is to at least put a confirmation message on the redisplayed screen.  Something like a little green banner saying "Your changes have been saved."  Of course if you then re-edit and re-save, the next screen would have the same green banner, and therefore still look the same.  At least this would give you some hint that the system is listening, and that you needn't worry.  Or I guess a timestamp could be included in the confirmation message.

Heck, even Google's wildly popular GMail application uses these types of banners.

Not a big deal.  Admins will get used to it and learn to "Trust the Google", but it's a small change that might help the newbies.


I'll be giving some screen shots to Dr. Search for his next article

January 10, 2008

Updated 2008 Enterprise Search Vendor Roundup

Jan. 10, 2008 - San Jose, CA, USA 

Microsoft announced they were acquiring FAST Search on January 8, forcing New Idea Engineering to amend our January 4th article "2008 Enterprise Search Vendors:  The new 'Fab4 ... and 1/2" (http://www.ideaeng.com/pub/entsrch/2008/number_01/article01.html). The announcement validates our original assessment and reinforces that search is mission critical for corporations, driving Microsoft to invest in a better search technology.

Some Highlights from NIE's 2008 Enterprise Search Vendor Roundup
 
Autonomy IDOL and FAST Search continue to hold the high end. K2 and Ultraseek are finally retiring.
Google's new version 5 appliance has arrived in the enterprise search mainstream.
Endeca is moving from the ecommerce side and had one of the most impressive search demos at ESS West 2007.
Lucene/ Nutch/ Solr (LNS) open source search engines continue to gain customer mindshare.
Microsoft with its acquistion moves in as Tier 1.
IBM and Oracle still not there.
 
Autonomy IDOL and FAST Search continue to hold the high end, evolving into "search platforms" that go beyond traditional drop in applications. The two leaders from earlier this decade, K2 and Ultraseek, are fading.

Google's new version 5 appliance has arrived in the enterprise search mainstream. While the new version won't satisfy every requirement, it addresses many of the earlier integration issues that had held it back. Expect to see the Google logo on a lot more enterprise portals.

Endeca has created some slick administration tools, doing very well in a head-to-head comparison with Autonomy and FAST despite their continued progress in this area.  As the importance of administration continues to increase, we are more enthusiastic about them in the Enterprise space.

Open source tools based on Lucene, including Nutch and Solr (LNS) are increasingly considered by companies, especially in niches that need to micromanage document relevancy and rating. Lucene and its derivatives are increasingly embedded in other software packages and services, to the point that many users won't even realize they're using it.

We had expected IBM to be the next entrant into the "Tier 1" lineup, based on their iPhrase acquisition. To our surprise, when we saw IBM at ESS East 2007, they were featuring one of their older engines, the OmniFind Enterprise Edition. IBM OmniFind is still not one of our new Fab 4 and an 1/2.

Dieselpoint, Intellisearch, Reccomind, ISYS, ZyLAN, Vivisimo, Siderean and Exalead have strong presences in niche markets.
 
To read the full article ... 2008 Enterprise Search Vendors: The New Fab 4 ... and 1/2. http://www.ideaeng.com/pub/entsrch/2008/number_01/article01.html

Search Blog Archive

Dr Search

  • Dr. Search is the technical genius of enterprise search. Feel free to Ask the Doctor any questions you may have about enterprise search.

Enterprise Search Newsletter

Other Resources