« Reviewing OpenPipeline | Main | Call for papers for Enterprise Search Summit East 2008 »

October 27, 2008

Grep is not a search engine

I actually started out to write an entry on the weird search terms we've seen, but that will have to wait. As I was doing some research for that entry, I ran into yet another annoyance we often see: a 'search engine' works just like grep.

For those of you who don't know the pleasures and utility of grep, I feel both regret and envy. After all, I've spent years relying on that bizarre Unix & Linux utility - so much so that I use the MKS Toolkit on all of my Windows PCs. But as useful as grep can be, it is not a search engine.

Consider Adobe Acrobat. I found a PDF on the web, and viewed it with the Adobe Acrobat Version 7.00 add-in for Firefox. I am looking for a phrase popular in the management service consulting business that describes a process: 'as is, to be'. Now, all of these are typically stop words which is the point of my 'weird searches' entry to come.

When you search a PDF file with the built-in Search feature, the built-in engine will return all instances of the sequences of characters you enter. Search for the phrase view and you'll see all of the instances of the term as well as the term views. Cool - stemming! But wait! Dig a bit further and you find it also returns review, interviews viewpoint, and any other terms whose only similarity to the original query is that it contains the same letter sequence. How about a phrase? Adobe doesn't seem to support quoting a phrase; but it seems when you enter multiple space-delimited terms it assumes you want a phrase search. But even in a phrase search, the last term only seems to start with a partial. Thus, a search for switch vendors will find the term; but it will also work if you search for switch v.

This capability can be cool - for example, if you want to find the instant of the string 30/60/90, you can do so. Heck, just type 30/ and you're there. And if you have really weird error numbers or status codes (0x00ffdd07) it works great!

In fairness, Adobe does let you specify whole words only and case-sensitive search. But often we see companies that provide grep-like search in their product or service and eagerly claim 'search included'. I guess companies for whom search is a check-box feature and not really seen as contributing to corporate success will accept such an attitude. 

And by the way - we don't think the SQL LIKE operator counts as a search engine either. But that's a rant for another day.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c84cf53ef0105351aed96970c

Listed below are links to weblogs that reference Grep is not a search engine:

Comments

I like your point - grep is a useful tool, but it's not a search engine.

However, in reviewing log files from a search engine, I have found grep to be a very, very useful tool! :-) (Along with probably several of those other utilities you use in MKS Toolkit, though I run on Linux so I get them by default.)

Hi Miles & Mark,
Yes, I agree, grep is not a search "engine" and neither is SQL LIKE. LIKE is more of a pattern search command that has very bad search performance for any large SQL table. SQL Server's Full-text Search Feather is better, but in older version of SQL Server this feature has performance and scalablity problem. However, in SQL Server 2008, Microsoft has truly "integrated" SQL FTS into SQL Server and while the performance & scalablity issues have be solved, it still lacks some key Enterprise Search functionality that is common in other Enteprise Search platforms, IMHO

Stay in touch,
John

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.

Search Blog Archive

Dr Search

  • Dr. Search is the technical genius of enterprise search. Feel free to Ask the Doctor any questions you may have about enterprise search.

Enterprise Search Newsletter

Other Resources