« February 2009 | Main | April 2009 »

5 posts from March 2009

March 16, 2009

Document Level Security on Google Search Appliance, good Doc link

Two of our readers have pointed out that there is an improved document (as of late March 2009) on managing security for Google search at 

http://code.google.com/intl/nl-NL/apis/searchappliance/documentation/52/secure_search/secure_search_overview.html

Thanks for the feedback!

-----

Covers both the Enterprise and Mini:
http://code.google.com/apis/searchappliance/documentation/50/secure_search/secure_search_overview.html

Covers HTTP basic and NTLM,Windows file shares, SharePoint, etc.

March 12, 2009

Search Relevancy and Japanese text, CJK, interesting thread on SearchDev.org

A really nice discussion over on SearchDev.org about relevancy when searching Japanese text and other CJK languages.  Touches on a lot of technical issues including tokenization, thesaurus, character set normalization, etc.

Folks chiming in about how a number of different search engines handle this including Autonomy IDOL, K2, Ultraseek and MarkLogic.

The actual thread:
http://tech.groups.yahoo.com/group/search_dev/messages/718?threaded=1&m=e&var=1&tidx=1

A tad hard to read with all the quoted text, but well worth a full skim, keep scrolling!

March 06, 2009

Why Wikia Search must Prevail

By Carl Grimm, New Idea Engineering

Some of the world’s greatest views are so stunning, so sublime that humanity ensures that they remain open to the public. There would be riots in the streets of Paris and worldwide scorn if the Eiffel Tower was sold to a private individual and closed. If we just posted picture of the summits of K2 or Mount Everest at their bases people would hardly accept the substitution. The Mount Everest of the internet's summit however is currently closed to the public. Even worse than substituting a picture all we get is little tiny search box reminiscent of the door slot on a prohibition era speakeasy. That summit is spelled Google.

The view for Google just keeps on getting better. In November of last year they obtained a patent on “a system [that] determines an ordered sequence of documents and determines an amount of novel content contained in each document of the ordered sequence of documents.” Not only do they have the view they can now pick out the most interesting landmark given the view.

The current landscape looks something like this. We can walk up to the base of the mountain and ask the oracle of search sitting atop, “Do you see any birds?” to which we get a reply. “I see about 3,710,000 birds. It took me .17 seconds to look at all the birds. Would you like me to tell you about the first 10 birds I find relevant?” That’s about all the oracle of search will tell you. Do not expect an invitation to the top to see the view and most certainly do not expect to breathe from the bag of visions used to separate the best birds from the rest.

It is the consolidation of this view in the hand of a few companies without access to the raw search indexes, ranking algorithms and other constituent parts that caused Jimmy Wales, one of the founders of Wikipedia, to launch Wikia Search.

How does this tie into Enterprise search you may ask? Consider the advances in new drug development and therapies that have come out of having sequenced the entire human genome. With such a broad view of content Google can tease out some of the finer nuances of human language and develop more powerful algorithms. The future of meaningful search in this information age could come from this broad view. Wikia Search, unlike Google and others, allows individuals to download the index used to generate search results as well as the code used for ranking algorithms rather than keep it private.

Jimmy Wales wanted to see the view. In fact he wanted us all to be able to see the view. In creating Wikia Search he has unleashed the beginnings of a totally open source internet search engine where he envisions that the public will regain control over all the content it has produced.  “Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge,” is perhaps Wales’ most known saying. He may have realized, like we all should, that perhaps having the ability for only a few for profit companies to be able to see and analyze the sum of all human knowledge is a potentially dangerous thing.

Google has done a lot for us, perhaps even revolutionized Internet search. Google’s 70% share of the search market clearly shows how much we are enjoying their innovations. Notwithstanding our demonstrated love perhaps we should consider the words of George Orwell when he said, “One does not establish a dictatorship in order to safeguard a revolution; one makes the revolution in order to establish the dictatorship.”

March 04, 2009

Searching for Strategy

Lynda Moulton of The Gilbane Group in a recent blog poses some excellent questions that every team, committee or RFP writer should consider before entering into the process of choosing a search technology. I will share some thoughts on why these problems exist along with some of the things we have seen in the space that contribute to the current state of these problems.

Search is purchased more often like a piece of software than it is purchased as a strategic means to an end. Questions like does it stem or perform lemmatization, can it highlight terms or can it perform natural language processing appear as if the very presence of the feature instantly solves a problem. By the time RFPs are formed the hard questions that might actually assist in selecting the most appropriate technology have been lost in vendor hype and turned into a feature war.

When the strategic end is lost, darkness falls and the wolves start to stray out of the forest. Suddenly features that the vendor's marketing department dreamed up in an attempt to distance them from the competition are starting to drive the decision. When everything begins to look the same you will at least be no worse or better off than your competition if you choose what they did. How was I to know the system with the most bells and whistles could not solve the problem? Everybody was choosing it! Nobody got fired for choosing IBM syndrome sets in.

This lack of strategy flows from a deep disconnect within firms. A majority of firms can recite to you that their most valuable assets are their people. Somehow they fail to connect the relationship between their most valuable asset and the information their most valuable assets produce. Good people make both good decisions and good knowledge. These little nuggets of information are stored across the enterprise in every imaginable form.

It is easy to see that an employee needs a computer and so many computers need a given number of servers to support them. Budgeting for search however has many unique considerations and is often an afterthought. We find many corporations do not have any planned set aside for search let alone continued search improvements on a year to year basis.

We also routinely see firms spend hundreds of thousands of dollars on search technology without a penny spent on any research into the unique nature of their company’s information and workflow. It is assumed that the unique knowledge workers that power their firm’s competitive advantage somehow produce and consume information like everybody else.

Until search takes its proper place as a strategic information systems decision rather than a simple infrastructure afterthought we will continue to see firms inefficiently leveraging search.

March 02, 2009

Enterprise Search Resources

Search Resources

There's a great deal of activity going on in the enterprise search market - groups and resources popping up everywhere. We thought we'd provide a list of the ones we know and respect best; feel free to add your own suggestions as comments and we'll post them in a follow up.

User Forums

SearchDev.org: The independent search developer's forum. A forum on the business and technology of search.

SearchDev also has two technical forums for detailed vendor-specific questions dealing with everything from coding and scripting to problem resolution, with more in the works:

autonomy.searchdev.org

fast.searchdev.org

LinkedIn Groups

Enterprise Search Engine Professionals Group: A fast-growing LinkedIn group for people working in or involved with enterprise search in corporate environments worldwide. Search for it under the Groups menu.

Enterprise Search Summit Group: A new group run by Michelle Manafy at Information Today which will provide industry news and information as well as details and podcasts about upcoming EDD events.

Newsletters

Enterprise Search Newsletter: Produced by New Idea Engineering, this newsletter covers both business and technical issues of search, generally at a more detailed technical level. It covers all vendors, provides advice for improving your search, and includes Ask Dr Search who answers technical questions from subscribers.

Blogs

Enterprise Search Blog: A blog produced by New Idea Engineering that covers all topics around the business and technology of enterprise search including opinion, news, events and more.

The Noisy Channel: This insightful blog, run by Daniel Tunkelang, CTO of Endeca, has a perspective on technology of enterprise search from someone who knows search from the ground up.

Beyond Search: Run by search guru Steve Arnold, Beyond Search contains news, interviews, and opinion on the search market delivered

SearchTools:  Avi Rappoport runs this blog which summarizes new content from her website http://searchtools.com/ which covers almost every search technology known to mankind!

SLI Systems Blog: Hosted search service SLI Systems provides a newsletter that talks about the kinds of problems they see in working with their customers. http://www.sli-systems.com/newsletter.php

FAST Forward Blog: A blog run by FAST Search staffed by FAST, Microsoft, and independent bloggers who write about search and IT issues at http://www.fastforwardblog.com/.

Attivio:The search vendor has a useful blog at  that had good general informaiton as well as Attivio-specific material.

Mark Logic Blog: Written by CEO Dave Kellogg, who shares interesting informaitn about technolgy. A fun read, and always informative.

Vivisimo Blog: Vivisimo runs the 'Search Done Right ' blog that provides grat background information on enterprise search. Like Attivio's blog, this has great background information that anyone can benefit from reading.

Flax Blog: From Lemur Consulting in the UK, the creators of the Flax open source search technology. You'll find more than just Flax here, though, with good coverage of issues relevant to enterprise search in general. 

Gilbane Search Practice Blog: Written by Lynda Moulton, this is a good background blog for enterprise search as well. Gilbane holds two interesting content management conferences a year that include a search track that can be worthwhile.

Two other blogs i find most interesting are not directly related to enterprise search, but I find good value when I follow them:

Andrew McAfee, a Professor at Harvard Business School. writes about IT issues, and he always has interesting material.

John Battelle, author of 'The Search...', has an interesting blog as well, and it's always fun to follow what he's doing.

Trade Shows

Enterprise Search Summit New York: Every May, Information Today sponsors the premier show for enterprise search in New York City. If you only go to one show a year, this is the one to go to. That's also the advice we give to new vendors entering the marketplace. We'll be back again this year, speaking about how you can save money by making your existing search engine work rather than replace it. By the way, you can listen to a preview of our talk, as well as talks by other speakers including Matt Brown of Forrester and Sid Probstein of Attivio.

Search Engine Meeting: Search Engine Meeting in an interesting show run by Infonortics from the UK. In its 14th year, this year's show returns to Boston in April 27-28; see you there!