« September 2008 | Main | November 2008 »

3 posts from October 2008

October 27, 2008

Grep is not a search engine

I actually started out to write an entry on the weird search terms we've seen, but that will have to wait. As I was doing some research for that entry, I ran into yet another annoyance we often see: a 'search engine' works just like grep.

For those of you who don't know the pleasures and utility of grep, I feel both regret and envy. After all, I've spent years relying on that bizarre Unix & Linux utility - so much so that I use the MKS Toolkit on all of my Windows PCs. But as useful as grep can be, it is not a search engine.

Consider Adobe Acrobat. I found a PDF on the web, and viewed it with the Adobe Acrobat Version 7.00 add-in for Firefox. I am looking for a phrase popular in the management service consulting business that describes a process: 'as is, to be'. Now, all of these are typically stop words which is the point of my 'weird searches' entry to come.

When you search a PDF file with the built-in Search feature, the built-in engine will return all instances of the sequences of characters you enter. Search for the phrase view and you'll see all of the instances of the term as well as the term views. Cool - stemming! But wait! Dig a bit further and you find it also returns review, interviews viewpoint, and any other terms whose only similarity to the original query is that it contains the same letter sequence. How about a phrase? Adobe doesn't seem to support quoting a phrase; but it seems when you enter multiple space-delimited terms it assumes you want a phrase search. But even in a phrase search, the last term only seems to start with a partial. Thus, a search for switch vendors will find the term; but it will also work if you search for switch v.

This capability can be cool - for example, if you want to find the instant of the string 30/60/90, you can do so. Heck, just type 30/ and you're there. And if you have really weird error numbers or status codes (0x00ffdd07) it works great!

In fairness, Adobe does let you specify whole words only and case-sensitive search. But often we see companies that provide grep-like search in their product or service and eagerly claim 'search included'. I guess companies for whom search is a check-box feature and not really seen as contributing to corporate success will accept such an attitude. 

And by the way - we don't think the SQL LIKE operator counts as a search engine either. But that's a rant for another day.

October 13, 2008

Reviewing OpenPipeline

OpenPipeline is an initiative proposed by search engine company Dieselpoint to begin development of standards in the enterprise and customer facing search marketplace.

"Current solutions are proprietary and require that search administrators define and manage data source connectors, file filters, text analyzers, taxonomy, and dictionaries for each search engine technology," says Miles Kehoe, CEO of New Idea Engineering. "Defining once and maintaining a single source regardless of how many and which search engine you use is a big win for customers. We hope other search engine vendors will be adopting this strategy soon." 

"Enterprise search is not the same as web searching", Chris Cleveland, CEO of Dieselpoint says, "because it entails all of the nitty-gritty preparation for search—that is, it requires doing all of those things you need to do to get a document and standardize it before indexing. OpenPipeline, he says, aims to streamline the preparation process through its innovative document-processing capabilities."

Additional information ... 2008 Enterprise Search Vendors: The New Fab 4 ... and 1/2. (http://www.ideaeng.com/pub/entsrch/2008/number_01/article01.html)

OpenPipeline was created and by Chris and his team of developers at Dieselpoint, whose intranet and customer-facing search product is written in Pure Java. Dieselpoint Search is a powerful product, and has many of what we call 'Enterprise Search 2.0' capabilities designed in from the start. For example, it has a web-based control panel for business and IT managers, and provides great support for features like dynamic facets, activity reporting, and powerful data crawling capabilities. It has an elegant and clean interface which is extremely scalable. Dieselpoint Search integrates OpenPipeline for crawling, parsing, analyzing, and routing documents.

About Dieselpoint
Founded in 1999, Dieselpoint provides high-performance search, navigation, and discovery/information retrieval software for structured and unstructured data. Every day, Dieselpoint customers search millions of items and terabytes of data. Customers like The Nielsen Company, Northrop Grumman, Porsche, HMV, McGraw-Hill, ITT, Waterstone’s Books, and British Telecom use Dieselpoint software for corporate portals, intranet search, product catalogs, and engineering databases. Dieselpoint has developed industry-leading advances in faceted search and scalability. Coupled with a new Open Pipeline architecture and outstanding ease of implementation, Dieselpoint is the platform of choice for corporate search needs.  Further information can be found online at www.dieselpoint.com.

October 08, 2008

Gartner Magic Quadrant 2008 Now Available

If you have not seen it, the new Gartner Magic Quadrant for Information Access - their name for intranet and customer facing search - has been published and is available for viewing on the Gartner web site thanks to a pointer from Microsoft's Analyst Relations page.

The big story, one which must have them fuming in England, is that Autonomy has dropped down a bit, and the combined Microsoft-FAST offerings have moved up a bit. This puts Autonomy a bit higher up on the 'Completeness of Vision' scale - by a few pixels - but a decent quarter-inch below Microsoft on the 'Ability to Execute' scale. Endeca, IBM, ZyLAB and Vivisimo squeaked into the upper right quadrant, while Google moved right to the link splitting the 'Challengers' from the 'Leaders', but ever so close - one could say the Google dot is on the line. It's odd that Google is not higher on the 'Ability to Execute' scale, since that usually means how well funded the company is. Perhaps they are looking at the budget/sales for only the Google appliance; but even then, Steve Arnold's numbers put them above the others on the scale.

Some excellent search products fell off the list this year, as Gartner has changed their methodology. The products we feel still qualify for the report include Dieselpoint, SLI Systems, and X1 Technologies, as well as newcomer Attivio. The article has more details. And as the con artist Fagan said in the play base don Dicken's Oliver Twist, '...if you happen to pass the Tower of London, have a look at the Crown Jewels'.