« August 2008 | Main | October 2008 »

5 posts from September 2008

September 29, 2008

The Future of Search is Simpler

Gary Szukalski, VP of Field Marketing at Autonomy,  gave a keynote speech at ESS West 2008 entitled "Meaning Based Computing" notable for its lack of vision. He was a replacement for Stouffer Egan, Autonomy's CEO, whose slides he likely used. The talk detailed the difficulties inherent in enterprise search, especially how challenging it can be to understand the full meaning of a word like "dog" or "shred" without context. We live in this world, we understand the difficulty.  While his talk outlined the direction of automatic categorization, alerts, profiles, dynamic and real-time clustering and schemas, it left me wanting real vision and a roadmap from the industry leader. He never rose above the complexity of information processing. 

In a sharp contrast, Google provided more vision in their 10-minute lunch pitch than Autonomy's same old key note speech. Google provided a clear vision: you can be up and operational in one day and search everything.  Simple to use and simple to administer.  In 2008, Google is not the naive implementation that we saw 6 years ago: they have made real progress toward their vision. While other companies are marching to that vision. Dieselpoint's OpenPipeline, Endeca's simple administrative controls, Fast's navigators, Autonomy's categorization, Google is providing the vision. The future is simpler and usable by everyone on the enterprise. We have along way to go - but we can change the business world.  We are moving closer to the vision of many sources of data providing insight and increasing the pace of business decisions.

September 16, 2008

Which search engine is cuter?

Walter Underwood of Netflix and formerly a key product architect at Ultraseek, has posted a great article on his blog he calls 'Search evaluation Kitten War'. Now, before you get concerned for the safety of some cute kittens or nervous about another physics lesson about Schrödinger's Kittens, I can assure that no animals were used in writing either of these articles..

Walter, who I once called 'Deep Code' in a newsletter article about Ultraseek years ago, points out something we've written about before: search engine beauty is in the eye of the user. Walter's posting gives an excellent overview of what you really want to look at when you evaluate a new enterprise search technology. In summary, when asking users to help evaluate vendors:

  • Cuteness counts: a pretty result page beats and ugly one. Try to make the results list look similar
  • Longhairs are cuter: Watch for visible differences that are not essential to the evaluation
  • Brand names are better: It's hard to be impartial when you see a Google icon
  • 5% doesn't matter: With a small sample, you'll have a scattered ranking distribution
  • Will search for food: Actively recruit users to test; don't just send an email asking for help.

As you might expect, Walter's article provides greater depth on each of these bullets. And while I know Walter loves Lucene, he's got no horse in the search engine evaluation race.

September 11, 2008

Google, dates, and UAL

In case you didn't hear, UAL stock tanked over the last two days. The Wall Street Journal reports that the parent company for United Airlines saw its share price fall from $12 a share at Monday's opening to a low of $3 a share by Tuesday afternoon.

The cause? A Google 'news' story reporting an impending bankruptcy filing for UAL.

Only there was no impending bankruptcy: Google picked up a story in the south Florida Sun Sentinel reprinted from the Chicago Tribune - dated December 2002. Today they have an interesting description of the sequence of events that lead up to the fire sale on UAL shares.

We've written before about the problem with information systems like Google Alerts. If you only look for a story new to your crawler, you risk believing you're discovered something new when in fact it's ancient. Verity, in its Topic Real Time, addressed these problems in the late 80s, but Google's philosophy that dates are not really important leaves us open to more such stories, more such panics, and more such headlines.

When you're dealing with spiders and freshness, it's never been sufficient to trust the web server date. But spidering technology can parse the article, looking for datelines and the like. And, if you've indexed the world's content, you can certainly look in your archives to confirm that the story isn't virtually identical to a story you found years before.

In your organization, you can have your spider generate a checksum 'fingerprint' to identify new content from new postings. You can also store a 'first seen on' date so you can identify new content, even on those systems where the web server lies about the freshness.

The end of the WSJ article tells a frightening possibility. To quote:

"Amid serious storms in Florida and on the East Coast, Web surfers checking for news about travel delays may have stumbled onto the old UAL story by mistake, and a small number of fresh hits may have been enough to drive it onto the list. A Tribune spokesman declined to say how many hits the article received but said there was no indication of fraud."

Well, good. NO fraud, just one of those things. Still, if this is all it takes, a small group of hackers around the world can decide to make an old story popular by viewing it enough - personally, or even programmatically. Google thinks the story is hot and fresh and publishes   it in an alert. Investment bankers pick it up and sell-sell-sell. If the hackers get greedy, they'll likely be found out in the subsequent investigation. Or not. Mad money, anyone?

September 10, 2008

New Idea Engineering Helps Orange County offer Residents Innovative Enterprise Search Technology to Community Web site

New search engine powered by FAST delivers quick and reliable search results allowing OC residents to easily find services

SANTA ANA, Calif. – September 10, 2008 – New Idea Engineering, Inc. (NIE) www.ideaeng.com and its partner InfoSolutions (www.infosolutions.com) today announced that they have helped the local Orange County, California government implement FAST Search & Transfer’s (FAST) Enterprise Search Platform© (ESP) technology for the county’s new and improved web site.

The County of Orange Information Technology Group, a public agency responsible for vital services to residents of Orange County, completed the first phase of the new site, implementing new Vignette portal and Fast Search technology, as well as converting several pilot agencies to the new site, in just four months.  FAST ESP will allow community members the ability to search the site more easily to locate information and interact with the county to reserve books, find park and recreation services, receive social services, and find quick and reliable answers to questions that arise in everyday life.

“With the support of the NIE / InfoSolutions team backed by FAST technology, our new search retrieval capabilities will have a significant impact on the delivery of information and services to our constituents,” said Satish Ajmani, Orange County's Chief Information Officer. “Our residents demand – and our staff provides – first class service. Since our comprehensive Web site encompasses online resources from numerous departments and agencies, we needed an infrastructure that seamlessly connects residents with essential information and services.”

The Web site’s new design and search platform connect Orange County’s 3.1 million residents to online services and individualized content for each of its departments. From paying property taxes online to locating information on animal care services or acquiring a business license, residents can now benefit from one of the most robust and user-friendly community Web sites available on the Internet.

Orange County selected NIE / InfoSolutions to implement FAST’s ESP technology due to the team’s extensive knowledge of the enterprise search industry as well as the complexity and scope of the project. The new search design required mapping each department’s and agency’s internal language and acronyms into user terms and building drill-down navigation to ensure users can quickly find accurate and reliable results. 

New Idea Engineering's President, Miles Kehoe, credits the project's success to the Orange County staff and the county’s visionary information technology team. According to Kehoe, “Migrating the old static Web pages to Vignette and FAST saved development time and cost, but ruled out a simple, ‘generic’ search solution. Ensuring that the search engine focused on the central Web page content rather than solely on the built-in navigation keywords was critical for providing relevant information to end users.”

InfoSolutions’ President, Bob Berberich, added, “On a project as complex as this, it helps to have a diversified team with deep skills to draw upon. It allows for much more than connecting the technical dots; it enables a creative synergy that allows us to truly address the client’s needs in both the short and the long term.”

To see the new site in action, please visit the Orange County Web site at: http://www.oc.ca.gov. 

Do You Plan to Attend ESS West 2008 in San Jose this month?

The Enterprise Search Summit - West starts Monday September 22 with  pre-conference workshops, and the show kicks off Tuesday the 23rd. We'll be exhibiting once again- please stop by and say hello at Booth 229! 

Early bird pricing ends  Sept 3!

  You can register here and get a special rate through New Idea Engineering. Use promotion code VIPIDEA.

Don't miss our sessions.

  • Tuesday Sept 23 2008 at 11:45 - 12:15 pm
    A101:  The Nuts and Bolts of Selecting a Search Engine
    Companies often spend huge sums of money and months of work effort to replace an existing enterprise search engine only to find they still are not happy with the results. With a little planning you can avoid this disaster. Kehoe will outline a phased approach for selecting an enterprise search engine, verifying quality of results against your existing solution, and transitioning to your new infrastructure. This talk takes a hard look at the fix vs. buy decision by focusing on methodology as well as on technology.
  • Wednesday Sept 24 2008 at 3:00 - 3:45pm 
    B206: Search and the Virtual Machine
    Enterprise search is incredibly demanding on hardware resources. Virtualized solutions allow server consolidation and higher server utilization. Virtualization also allows the IT staff to better allocate resources—processors and memory—to optimize performance, yet there are trade-offs to be considered with any approach. This session will examine virtualized solutions in the context of real-world implementations to help attendees understand how this approach can impact operation and performance.